Py3K PEPS

APAC PyCon 2010, Singapore

Senthil Kumaran

About Me

  • Python Core Developer.

  • Maintainer of urllib2 and related modules.

  • Software Engineer at Akamai, India.

Status of Python

  • Python 3.2 ( py3k and will become trunk)

  • Python 3.1.2 (release31-maint)

  • Python 3.0 is not there!

  • Python 2.7 (current trunk)

  • Python 2.6.5 (release26-maint)

  • http://www.python.org/dev/

PEP 3000

  • GuidoVan Rossum - 05/April/2006 (It's June/2010 now. Lot of thought has gone in)

  • Python 3.0 will break the backwards compatiblity with Python 2.x

  • Python 2.6+ to support forward compatiblity.

  • Implementation Details – Write in C.

  • Source code conversion tool (2to3), context free source to source translation.

Handout

Python 3000, Python 3.0 and Py3K are all names for the same thing. The project is called Python 3000, or abbreviated to Py3k. The actual Python release will same naming convention is used for Python 2.x.

Guido planned that Python 3.1 and 3.2 to be released much sooner after 3.0 than it has been customary for the 2.x series. The 3.x release pattern will stabilize once the community is happy with 3.x.

Actually Python 3.0 shippped with io module written in Python which caused significant performance problems. Python Core Devs recommend against the use of Python 3.0 for any purpose. Use Python 3.1 which is better.

Handout

Python 3.x will be break the backward compability, but Python 2.6 and Python 2.7 will support forwarding compatibility in the following ways.

  • It will support a "Py3k warnings mode" which will warn dynamically (i.e. at runtime) about features that will stop working in Python 3.0, e.g. assuming that range() returns a list.

  • It will contain backported versions of many Py3k features, either enabled through __future__ statements or simply by allowing old and new syntax to be used side-by-side (if the new syntax would be a syntax error in 2.x).

  • 2to3 source conversion tool is provide.This tool can do a context-free source-to-source translation. For example, it can translate apply(f, args) into f(*args). However, the tool cannot do data flow analysis or type inferencing, so it simply assumes that apply in this example refers to the old built-in function.

Handout

The recommended development model for a project that needs to support Python 2.7 and 3.2 simultaneously is as follows:

  1. You should have excellent unit tests with close to full coverage.

  2. Port your project to Python 2.7.

  3. Turn on the Py3k warnings mode.

  4. Test and edit until no warnings remain.

  5. Use the 2to3 tool to convert this source code to 3.2 syntax. Do not manually edit the output!

  6. Test the converted source code under 3.2.

  7. If problems are found, make corrections to the 2.6 version of the source code and go back to step 3.

  8. When it's time to release, release separate 2.6 and 3.2 tarballs (or whatever archive form you use for releases).

Handout

It is recommended not to edit the 3.2 source code until you are ready to reduce 2.6 support to pure maintenance (i.e. the moment when you would normally move the 2.6 code to a maintenance branch anyway).

PEP 3100

  • Brett Canon – Mischellaneous Python 3 Plans.

  • This is overall implementation details, kind of index to all the other PEPS which will detail the specific changes.

  • One PEP to Check them all.

  • PEP 238 (Changing the Division Operator)

  • PEP 328 (Imports: Multi-Line and Absolute/Relative)

  • PEP 343 (The "with" Statement)

  • PEP 352 (Required Superclass for Exceptions)

http://www.python.org/dev/peps/pep-3100/

Python Regrets presentation: http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf

PEP 3101

  • Talin – Advanced String Formatting

  • Current ones - '%' and string.Template methods.

'{0},{1}'.format('hello','world')
  • Details the format specifiers options.

[[fill]align][sign][#][0][minimumwidth][.precision][type]

PEP 3105

  • Make print a function

  • Good move!

  • Rationale for making print as a function.

  • Signature of the print function

def print(*args, sep=' ', end='\n', file=None)
  • No backwards compatible. All your python 2 code which contains print statement will break.

Handout

  • print is the only application-level functionality that has a statement dedicated to it. Within Python's world, syntax is generally used as a last resort, when something can't be done without help from the compiler. Print doesn't qualify for such an exception.

  • At some point in application development one quite often feels the need to replace print output by something more sophisticated, like logging calls or calls into some other I/O library. With a print() function, this is a straightforward string replacement, today it is a mess adding all those parentheses and possibly converting >>stream style syntax.

Handout

  • Having special syntax for print puts up a much larger barrier for evolution, e.g. a hypothetical new printf() function is not too far fetched when it will coexist with a print() function.

  • There's no easy way to convert print statements into another call if one needs a different separator, not spaces, or none at all. Also, there's no easy way at all to conveniently print objects with some other separator than a space.

Handout

  • If print() is a function, it would be much easier to replace it within one module (just def print(*args):...) or even throughout a program (e.g. by putting a different function in __builtin__.print). As it is, one can do this by writing a class with a write() method and assigning that to sys.stdout -- that's not bad, but definitely a much larger conceptual leap, and it works at a different level than print.

Luckily, as it is a statement in Python 2, print can be detected and replaced reliably and non-ambiguously by an automated tool, so there should be no major porting problems.

PEP 3106

  • Guido – Changes to dict.keys(), .values() and .items()

  • It returns a set like unordered container objects whose contents are derived from underlying dictionary rather than a list which is a copy of keys etc.

  • Remove iterkeys, itervalues and iteritems method.

  • Inspired by the Java Collections Framework.

handout

Revamping dict.keys(), .values() and .items()

This PEP proposes to change the .keys(), .values() and .items() methods of the built-in dict type to return a set-like or unordered container object whose contents are derived from the underlying dictionary rather than a list which is a copy of the keys, etc.; and to remove the .iterkeys(), .itervalues() and .iteritems() methods.

continued

The approach is inspired by that taken in the Java Collections Framework.

The objects returned by the .keys() and .items() methods behave like sets. The object returned by the values() method behaves like a much simpler unordered collection -- it cannot be a set because duplicate values are possible.

continued

Because of the set behavior, it will be possible to check whether two dicts have the same keys by simply testing:

if a.keys() == b.keys(): ...

and similarly for .items().

PEP 3108

PEP 3109

  • Changes to Python's Exception Raise mechanism.

  • Intended to reduce Line Noise as well as size of the language.

  • Consolidation of several raise forms – raise X,Y and raise X(Y) which are equivalent in py2.

  • Changes to raise exception syntax and throw method in generator object.

  • To hold exception object raise E as e

handout

One of Python's guiding maxims is "there should be one -- and preferably only one -- obvious way to do it". Python 2.x's raise statement violates this principle, permitting multiple ways of expressing the same thought. For example, these statements are equivalent:

raise E, V

raise E(V)

Continued

There is a third form of the raise statement, allowing arbitrary tracebacks to be attached to an exception.

    raise E, V, T

where T is a traceback. As specified in PEP 344,
exception objects in Python 3.x will possess a ``__traceback__``
attribute, admitting this translation of the three-expression
``raise`` statement: ::

    raise E, V, T

is translated to ::

    e = E(V)
    e.__traceback__ = T
    raise e

Continued

Using these translations, we can reduce the raise statement from four forms to two:

  1. raise (with no arguments) is used to re-raise the active exception in an except suite.

  2. raise EXCEPTION is used to raise a new exception. This form has two sub-variants: EXCEPTION may be an exception class or an instance of an exception class; valid exception classes are BaseException and its subclasses. If EXCEPTION is a subclass, it will be called with no arguments to obtain an exception instance.

    To raise anything else is an error.

PEP 3110

  • Changes to catching Exceptions.

except E, N:
except E as n:

PEP 3111

  • Simple input builtin for python

  • No longer input() and raw_input()

  • Its just one, which is raw_input() renamed to just the input()

handout

Specification

The existing raw_input() function will be renamed to input().

The Python 2 to 3 conversion tool will replace calls to input() with eval(input()) and raw_input() with input().

BDFL Thoughts on this issue:

http://mail.python.org/pipermail/python-3000/2006-December/005249.html

Continued

I like the exact proposal made here better than any of the alternatives mentioned so far.

Against naming it readline(): the "real" readline doesn't strip the n and returns an empty string for EOF instead of raising EOFError; I believe the latter is more helpful for true beginners' code.

Against naming it ask() and renaming print() to say(): I find those rather silly names that belong in toy or AI languages. Changing print from statement to function maintains Pythonicity; renaming it say() does not.

I don't expect there will be much potential confusion with the 2.x input(); that function is used extremely rarely. It will be trivial to add rules to the refactoring tool (sandbox/2to3/) that replace input() with eval(input()) and replace raw_input() with input().

Guido

PEP 3112

  • Literal syntax for bytes object - b'bytes'

  • Convenient way to spell ascii strings and arbitrary binary data.

    bytes('Hello world', 'ascii')
    'Hello world'.encode('ascii')

The proposed syntax is::

    b'Hello world'

Existing spellings of an 8-bit binary sequence in Python 3000 include::

    bytes([0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00])
    bytes('\x7fELF\x01\x01\x01\0', 'latin-1')
    '7f454c4601010100'.decode('hex')

The proposed syntax is::

    b'\x7f\x45\x4c\x46\x01\x01\x01\x00'
    b'\x7fELF\x01\x01\x01\0'

Continued

In both cases, the advantages of the new syntax are brevity, some small efficiency gain, and the detection of encoding errors at compile time rather than at runtime. The brevity benefit is especially felt when using the string-like methods of bytes objects:

lines = bdata.split(bytes('\n', 'ascii'))  # existing syntax
lines = bdata.split(b'\n')  # proposed syntax

And when converting code from Python 2.x to Python 3000:

sok.send('EXIT\r\n')  # Python 2.x
sok.send('EXIT\r\n'.encode('ascii'))  # Python 3000 existing
sok.send(b'EXIT\r\n')  # proposed

PEP 3113

  • Removal of tuple parameter unpacking.

def fxn(a, (b, c), d):      pass
  • Remove the ablity to pass a container to be received as tuple. More harmful than useful.

  • Introspection issues. Un informative error messages, little use-cases.

handout

Tuple parameter unpacking is the use of a tuple as a parameter in a function signature so as to have a sequence argument automatically unpacked. An example is:

def fxn(a, (b, c), d):
    pass

The use of (b, c) in the signature requires that the second argument to the function be a sequence of length two (e.g., [42, -13]). When such a sequence is passed it is unpacked and has its values assigned to the parameters, just as if the statement b, c = [42, -13] had been executed in the parameter.

Continued

Unfortunately this feature of Python's rich function signature abilities, while handy in some situations, causes more issues than they are worth. Thus this PEP proposes their removal from the language in Python 3.0.

The 2to3 refactoring tool 2to3 will gain a fixer fixer for translating tuple parameters to being a single parameter that is unpacked as the first statement in the function. The name of the new parameter will be changed. The new parameter will then be unpacked into the names originally used in the tuple parameter. This means that the following function

 def fxn((a, (b, c))):
     pass

will be translated into::

 def fxn(a_b_c):
     (a, (b, c)) = a_b_c
     pass

Continued

As tuple parameters are used by lambdas because of the single expression limitation, they must also be supported. This is done by having the expected sequence argument bound to a single parameter and then indexing on that parameter:

lambda (x, y): x + y

will be translated into:

lambda x_y: x_y[0] + x_y[1]

PEP 3114

  • Renaming iterator.next() to iterator.__next__()

  • In python 2.x, next() is a protocol, not a method.

  • Its for consistency sake with other implicitly called methods.

next(x) → x.__next__()
  • iter(function, sentinel) <--> next(iterator, sentinel)

handout

Following this pattern, the natural way to handle next is to add a next built-in function that behaves in exactly the same fashion.

  • next(x) --> internal tp_iternext --> x.__next__()

Further, it is proposed that the next built-in function accept a sentinel value as an optional second argument, following the style of the getattr and iter built-in functions. When called with two arguments, next catches the StopIteration exception and returns the sentinel value instead of propagating the exception. This creates a nice duality between iter and next:

iter(function, sentinel) <--> next(iterator, sentinel)

Continued

There have been a few objections to the addition of more built-ins. In particular, Martin von Loewis writes:

I dislike the introduction of more builtins unless they have a true generality (i.e. are likely to be needed in many programs). For this one, I think the normal usage of __next__ will be with a for loop, so I don't think one would often need an explicit next() invocation.

It is also not true that most protocols are explicitly invoked through builtin functions. Instead, most protocols are can be explicitly invoked through methods in the operator module. So following tradition, it should be operator.next.

...

As an alternative, I propose that object grows a .next() method, which calls __next__ by default.

Implementation and Transformation

Method definitions named next will be renamed to __next__.

Explicit calls to the next method will be replaced with calls to the built-in next function. For example, x.next() will become next(x).

PEP 3115

  • Metaclasses in Python 3

  • Changes to Syntax for Declaring metaclasses and alters the semantics of how classes with metaclasses are constructed.

class C(B1, B2, metaclass=MC, *more_bases, **kwds):  pass

handout

Information on metaclass

http://www.python.org/doc/essays/metaclasses/meta-vladimir.txt http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python http://www.voidspace.org.uk/python/articles/metaclasses.shtml

def make_hook(f):
        """Decorator to turn 'foo' method into '__foo__'"""
        f.is_hook = 1
        return f

class MyType(type):
    def __new__(cls, name, bases, attrs):

        if name.startswith('None'):
            return None

        # Go over attributes and see if they should be renamed.
        newattrs = {}
        for attrname, attrvalue in attrs.iteritems():
            if getattr(attrvalue, 'is_hook', 0):
                newattrs['__%s__' % attrname] = attrvalue
            else:
                newattrs[attrname] = attrvalue

        return super(MyType, cls).__new__(cls, name, bases, newattrs)

    def __init__(self, name, bases, attrs):
        super(MyType, self).__init__(name, bases, attrs)

        # classregistry.register(self, self.interfaces)
        print "Would register class %s now." % self

    def __add__(self, other):
        class AutoClass(self, other):
            pass
        return AutoClass
        # Alternatively, to autogenerate the classname as well as the class:
        # return type(self.__name__ + other.__name__, (self, other), {})

    def unregister(self):
        # classregistry.unregister(self)
        print "Would unregister class %s now." % self

class MyObject:
    __metaclass__ = MyType


class NoneSample(MyObject):
    pass

# Will print "NoneType None"
print type(NoneSample), repr(NoneSample)

class Example(MyObject):
    def __init__(self, value):
        self.value = value
    @make_hook
    def add(self, other):
        return self.__class__(self.value + other.value)

# Will unregister the class
Example.unregister()

inst = Example(10)
# Will fail with an AttributeError
#inst.unregister()

print inst + inst
class Sibling(MyObject):
    pass

ExampleSibling = Example + Sibling
# ExampleSibling is now a subclass of both Example and Sibling (with no
# content of its own) although it will believe it's called 'AutoClass'
print ExampleSibling
print ExampleSibling.__mro__

PEP 3116

  • New I/O Module.

  • Specification for Basic Byte based I/O Stream to which we can add further buffering and text handling features.

  • Enables the programmers to treat stream like interfaces in very stream like manner instead of just for file-like object or for socket interface obj.

  • The new I/O spec is influenced by Java's I/O libraries.

  • Open() factory method backwards compatible with old IO.

handout

Python allows for a variety of stream-like (a.k.a. file-like) objects that can be used via read() and write() calls. Anything that provides read() and write() is stream-like. However, more exotic and extremely useful functions like readline() or seek() may or may not be available on every stream-like object. Python needs a specification for basic byte-based I/O streams to which we can add buffering and text-handling features.

Once we have a defined raw byte-based I/O interface, we can add buffering and text handling layers on top of any byte-based I/O class. The same buffering and text handling logic can be used for files, sockets, byte arrays, or custom I/O classes developed by Python programmers. Developing a standard definition of a stream lets us separate stream-based operations like read() and write() from implementation specific operations like fileno() and isatty(). It encourages programmers to write code that uses streams as streams and not require that all streams support file-specific or socket-specific operations.

The new I/O spec is intended to be similar to the Java I/O libraries, but generally less confusing. Programmers who don't want to muck about in the new I/O world can expect that the open() factory method will produce an object backwards-compatible with old-style file objects.

PEP 3118

  • Revising the buffer protocol.

  • Improving the ways Python allows memory sharing in Python 3.

  • It is useful for extension modules.

PEP 3119

  • Guido – Introduction of Abstract Base Classes (ABC).

  • Way to overload isinstance and issubclass.

  • A new module abc which is to support the Abstract Base Class framework.

  • Specific ABCs for Containers and Iterators to be added to collections module.

  • What makes a set, mapping, sequence?

  • Companion PEP 3141 for numeric types.

handout

ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed.

In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.

Like all other things in Python, these promises are in the nature of a gentlemen's agreement, which in this case means that while the language does enforce some of the promises made in the ABC, it is up to the implementer of the concrete class to insure that the remaining ones are kept.

PEP 3120

  • Using UTF-8 as the default source Encoding.

  • Martin von Löwis

  • Default Source Encoding of Python scripts from ASCII to UTF-8.

  • IDLE Changed to use UTF-8 as default encoding.

handout

The parser needs to be changed to accept bytes > 127 if no source encoding is specified; instead of giving an error, it needs to check that the bytes are well-formed UTF-8 (decoding is not necessary, as the parser converts all source code to UTF-8, anyway).

PEP 3129

  • Introduction of Class Decorators.

  • Collin Winters, Jack Diederich

  • Its an extension of function and method decorators.

  • To make certain constructs more easily expressed and less reliant on Cpython Interpretor implementation.

  • Its simpler than using metaclasses for class to acheive the same purpose.

handout

It is an extension to the function and method decorators introduced in PEP 318.

When function decorators were originally debated for inclusion in Python 2.4, class decorators were seen as obscure and unnecessary thanks to metaclasses. After several years' experience with the Python 2.4.x series of releases and an increasing familiarity with function decorators and their uses, the BDFL and the community re-evaluated class decorators and recommended their inclusion in Python 3.0.

The motivating use-case was to make certain constructs more easily expressed and less reliant on implementation details of the CPython interpreter. While it is possible to express class decorator-like functionality using metaclasses, the results are generally unpleasant and the implementation highly fragile. In addition, metaclasses are inherited, whereas class decorators are not, making metaclasses unsuitable for some, single class-specific uses of class decorators. The fact that large-scale Python projects like Zope were going through these wild contortions to achieve something like class decorators won over the BDFL.

Examination of Decorators - PEP 318

PEP 3132

  • Extended Iterable unpacking

>>>a, *b, c = range(5)
>>>a
0
>>>c
4
>>>b
[1, 2, 3]

PEP 3135

  • New super

  • Syntactic Suger for the super.

  • super().foo(1, 2) instead of super(Foo, self).foo(1, 2)

PEP 3137

  • Guido – Immutable bytes and Mutable buffer

  • bytes is a immutable array of bytes.

  • bytearray is a mutable array of bytes.

  • memoryview is a bytes view of another object.

PEP 3138

  • String representation in Py3k.

use repr()
`backticks` gone – its same as repr()
Repr is Unicode based and not ASCII based in py3

2to3 Tool

  • Context sensitive source code to source code translator.

  • Useful to convert the py2 source to py3.

  • Use it.

  • Look or str, bytes issues after the translation. Fix those and fix the interfaces (phew!).

  • You are done.

PEP 3099

  • Greog Brandl – April-2006- Things which will not change in Python 3000.

  • BDFL pronoucements on things which will not happen.

  • Things such as 'self' will not become implicit.

  • Parser wont be more complex than LL

  • No braces, no more backticks

  • The interpreter prompt will not change. It gives Guido warm fuzzy feelings..

PEP 3003

  • Python language Moratorium.

  • Other Python Implementations to Catch up.

  • Changes put into Python be used by large user-base.

  • Does not allow for Exceptions to changes in the syntax and semantics of the language.

  • Defines what can change and what cannot change.

Other Py3k Features

  • Improved GIL by Antoine Pitrou

  • Unladen Swallow Integration.

  • Excellent Test Coverage - Michael Frood.

  • An Excellent Language where there is more symmetry.

Symmetry is a complexity-reducing concept (co-routines include subroutines); seek it everywhere.

Alan J. Perlis — Epigrams on Programming, Sept., 1982

Good Resources

Thank you. :)

Py3k PEPS

By Senthil Kumaran

Py3k PEPS

Python 3 PEPS

  • 1,292