For more history and discussion, see here.
Simplest major change from 2 to 3:
Print is now a function:
Instead of a macro:
print("string1" + "string2")
print "string1", "string2"
When not using a cloud-based solution, like CoCalc, or Wakari, you can install the Anaconda distribution
Anaconda is the leading Data Science distribution of python, and contains a huge number of useful scientific packages out of the box
It is strongly recommended that you also use linux for code development - almost all serious data scientists heavily use UNIX-based servers
If you use MacOS, note that it is also based on UNIX, and even has a bash terminal in its Utilities folder
Many data scientists stick with MacOS for this reason
Use IPython interpreter for interactive interpreter sessions:
Interactive Python
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
(Side note: if you type '?' for help, the letter 'q' quits the scrolling program "less")
(Warning: Be sure to use "ipython" and not just "python" from a terminal)
In [1]: 1+2
Out[1]: 3
In [2]: _i1
Out[2]: u'1+2'
In [3]: _1
Out[3]: 3
Commands are stored in a history:
Interactive help is available:
In [1]: import this
(Run this code for wisdom)
In [2]: import numpy as np
In [3]: import pandas as pd
In [4]: import matplotlib.pyplot as plt
In [5]: dir()
Out[5]:
[ ...
'np',
'pd',
'plt',
... ]
In [6]: dir(np)
In [7]: help(pd)
Modules are often renamed during import for brevity:
(Efficient numerical arrays)
(Efficient tables for big data)
(Matlab-style plots)
(The imported name appears in the global namespace, with its own private namespace inside)
In [1]: import this
(Run this code for wisdom again, for good measure)
In [2]: import sys
In [3]: sys.version
Out[3]: '3.5.2 (default, Jun 28 2016, 08:46:01) \n[GCC 6.1.1 20160602]'
In [4]: import os
In [5]: os.system("seq -w -s ' ' 005")
001 002 003 004 005
Out[5]: 0
In [6]: help(os.system)
In [7]: dir(os)
You can always introspect symbols and namespaces:
(the string version lives inside sys)
(the function system lives inside os)
(to see symbols in a namespace, use dir)
<Tab> works to autocomplete symbol names within a namespace
Python is wonderfully easy to learn
It has been called "executable pseudo-code" due to its clean syntax
Besides your textbook, many excellent tutorials exist for free online:
How to learn a language quickly:
# This is a comment
# Try out the following lines in an interpreter
# This is an assignment of an integer to a variable l
l = 3
type(l) # returns: int
# This is a reassignment of a float to the same variable l
l = 3.
type(l) # returns: float
# This is a tuple of Boolean truth values
(True, False)
# This is a string
"Hello world."
# This simultaneously assigns
# l to a list, m to a tuple, n to a set, and o to a dictionary
l, m, n, o = [1, 2, 3], (1, 2, 3), {1, 3, 3, 2}, {"Alice": True, "Bob": False}
# Basic data structures:
# list : ordered, mixed types, duplicates ok, can be changed
# tuple : ordered, mixed types, duplicates ok, cannot be changed
# set : unordered, mixed types, duplicates removed, can be changed
# dict : unordered, mixed types, named keys, no duplicate keys, can be changed
# Good for what?
# list : sequential traversal, frequent add/drop of elements
# tuple : sequential traversal, random positional access
# set : removing duplicates, unordered iteration
# dict : efficient random keyword access
Test everything out in an IPython session to get a good feel for how the language works: use help(object) or function? or function??
Master these
# This is a function definition with two positional arguments
def funFunction(arg1, arg2):
"""This is a docstring used by python's help() function"""
# The four space indent is required: It indicates the scope of the def
# Manipulate arguments here. The function evaluates to its "return" value:
return valueToReturn
# This is a function definition with two keyword arguments
def funFunction2(arg1="default arg1", arg2="default arg2"):
"""docstring"""
pass # If the function does nothing (dummy function), a pass is needed
# This is a function with a variable number of positional and keyword arguments
def funFunction3(*args, **kwargs):
"""args is a tuple (arg1, arg2, ...) of positional arguments
kwargs is a dictionary {arg1 : val1, arg2 : val2, ...} of keyword arguments"""
# Actions that do not return a value are called "side effects"
# Try to minimize side effects - choose one of:
# Functions like print that only perform actions are 'procedures'
# Functions that process inputs and return outputs are 'pure functions'
# Keeping these two separated whenever possible will reduce bugs.
print(args)
print(kwargs)
def funFunction4(arg1):
"""Simple example of a closure for partially specifying arguments"""
def inner(arg2):
# This is a function restricted to the scope of funFunction4, called a "helper function"
# Note that it can use arg1 from the enclosing function
return (arg1 + arg2)
# Functions are first-class objects, so can be returned as a "value"
# Here the inner function is returned, but with arg1 fixed. This is called a "closure"
return inner
Organize logical operations into small functions that do a single task well
Never copy-paste blocks of code in multiple places - this makes it hard to modify later
Keywords
def _():
return
pass
*args
**kwargs
# for loops iterate over items in an iterable object
for animal in ["cat", "dog", "emu", "naked mole rat"]:
# Unlike C, iterations do not use indices
print( len(animal) )
# many things are iterable:
# list, tuple, set, dict, generator
# Generators are defined with the 'yield' keyword
def gen_ints():
"""Create a generator for all infinity of positive integers"""
n = 1
# A while loop executes its interior until the Boolean test fails
# By convention, infinite loops use the following construction:
while True:
# yield is like return, but pauses the function to return the value
yield n
# the function is resumed again here when the next value is needed
n += 1
g = gen_ints() # Define a generator object from the above definition
print( next(g) ) # Get the next element (find the next yield call)
# A for loop automatically iterates over elements of a generator
# until it runs out of elements, then it terminates
for i in gen_ints():
# conditional statements use: if, elif, else
if i < 10:
print("Small int", i)
elif i < 100:
print("Medium int", i)
else:
print("Too big! Aborting.")
# Without breaking out of the for loop, it would be an infinite loop!
# The generator as defined above has no termination condition, so it just
# keeps going forever unless some limit is imposed on it
break
Keywords
for _ in _ :
while _ :
yield
if _ :
elif _ :
else:
break
# Long way to construct a list with a for loop
l = []
for i in range(10):
l.append( i**2 )
# List comprehension of the same
l = [i**2 for i in range(10)]
# Set comprehension for analogous set
s = {i**2 for i in range(10)}
# Dict comprehension works similarly
d = {str(i):i**2 for i in range(10)}
# Long way to define a single-use generator
def make_gen():
for i in range(100):
for j in range(100):
yield (i,j)
g = make_gen()
# Generator comprehension of the same
g = ( (i,j) for i in range(100) for j in range(100) )
Keyword:
lambda arg1, arg2 : stuff
# The following are equivalent definitions
def f(n):
return n+1
f = lambda n : n+1
# This is particularly useful for closures
def powerFactory(n):
return lambda x : x**n
square = powerFactory(2)
square(4) # returns 16
Comprehensions are compact and clear ways of defining data structures using existing iterables
They are faster than the equivalent for loops for constructing data
We will have much more to say about making python efficient later on
Anonymous (single-use) functions can be defined using lambda expressions
Coding style is incredibly important
Other coders expect code to look a certain way, so if you code differently, it prevents them from understanding your code easily
Adopting uniformity of style also helps you as a coder to ensure your own code is well-written and well-designed
Python has an established style guide
Read it. Internalize it. Reference it.
Documentation is doubly important
Having well-formatted and informative docstrings will make your code much easier to use for others
See an example from Google below on how to write complete and informative docstrings
Any python file (with extension .py) is a module
Always write python code with the idea that you are extending the language with a custom module
#!/usr/bin/env python
"""Module docstring
Include a description of the use of your module here.
This is used by the python help() command in the interpreter.
"""
# Global variables (try not to use many)
version = 0.1
# Function definitions
def func1(n):
"""Function docstring for use in help()"""
pass
# Shielded main block (at end of file)
if __name__ == "__main__":
# Put code in here that only runs
# when .py file is run from command line
# This will not be run when the module is imported
pass
(top line tells linux this could be run as a python script)
(Always document code with docstrings properly - see the help() output for any python module or function for style examples)
(executable code must go in main block, or it will be run on every import statement)
Practice makes perfect.
Keep references handy until you remember commands on command.