IPython and Jupyter

Overview

Assistant Professor Justin Dressel

Faculty of Mathematics, Physics, and Computation

Schmid College of Science and Technology

 

Python 2 or Python 3?

  • Python 3 fixes many historical issues with Python 2
  • Python 2.7 supports as much Python 3 syntax as possible
  • Some projects have not yet been updated to 3 (e.g., Sage)
    However, most established projects have finally switched
  • Use Python 3 whenever possible - it is the future

For more history and discussion, see here.

Simplest major change from 2 to 3:

Print is now a function:

Instead of a macro:

print("string1" + "string2")
print "string1", "string2"

Recommended Install

When not using a cloud-based solution, like Sage Math Cloud, or Wakari, you can install the Anaconda distribution

Anaconda is the leading Data Science distribution of python, and contains a huge number of useful scientific packages out of the box

It is strongly recommended that you also use linux for code development - almost all serious data scientists heavily use UNIX-based servers

 

If you use MacOS, note that it is also based on UNIX, and even has a bash terminal in its Utilities folder

Many data scientists stick with MacOS for this reason

IPython

Use IPython interpreter for interactive interpreter sessions:

  • In a bash terminal run the command "ipython3"
  • You should be greeted with the following python prompt:

Interactive Python

  • Easy to learn, designed for clarity, fast to write, flexible
  • Philosophy:  code is read more often than written
  • Glue: can easily interface with other (C, C++, Julia, ...) code
  • Rapid growth in both industry and academia
Python 3.4.3 (default, Aug 12 2016, 00:20:55)
Type "copyright", "credits" or "license" for more information.
 
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
 
In [1]: 

IPython

In [1]: 1+2
Out[1]: 3
 
In [2]: _i1
Out[2]: u'1+2'
 
In [3]: _1
Out[3]: 3

Commands are stored in a history:

  • In [x] / Out[x] : Input and output lines x
  • _ix : variable storing input of line x
  • _x  : variable storing output of line x
  • Up/Down arrows cycle through history
  • Ctrl-r : searches history (like in bash)
  • Tab : completes commands (like in bash)

Interactive help is available:

  • ?  :  Open help in less (remember:  q to exit)
  • %quickref  :  Quick IPython feature reference
  • help(object )  :  Open documentation for object
  • object ? , object ??  :  Quick access to help for object
  • dir()  :  List all currently defined variables
    (Note: the python convention for "hidden" variables is to use underscores __)

Python Modules

  • To keep code organized, python uses a module system
  • Modules organize names into namespaces
  • Modules store functionality beyond the core language
  • Modules are loaded with  import
In [1]: import this

(Run this code for wisdom)

In [2]: import numpy as np
 
In [3]: import pandas as pd
 
In [4]: import matplotlib.pyplot as plt
 
In [5]: dir()
Out[5]: 
[ ...
 'np',
 'pd',
 'plt',
 ... ]

In [6]: dir(np)

In [7]: help(pd)

Modules are often renamed during import for brevity:

(Efficient numerical arrays)

(Efficient tables for big data)

(Matlab-style plots)

(The imported name appears in the global namespace, with its own private namespace inside)

Python Namespaces

  • Python namespaces organize symbols into logical groups
  • The period operator extracts symbols from a namespace
In [1]: import this

(Run this code for wisdom again, for good measure)

In [2]: import sys

In [3]: sys.version
Out[3]: '3.5.2 (default, Jun 28 2016, 08:46:01) \n[GCC 6.1.1 20160602]'
 
In [4]: import os

In [5]: os.system("seq -w -s ' ' 5")
001 002 003 004 005
Out[5]: 0

In [6]: help(os.system)

In [7]: dir(os)

You can always introspect symbols and namespaces:

(the string version lives inside sys)

(the function system lives inside os)

(to see symbols in a namespace, use dir)

<Tab> works to autocomplete symbol names within a namespace

Learning Python

Python is wonderfully easy to learn

It has been called "executable pseudo-code" due to its clean syntax

Many excellent tutorials exist for free online to help:

How to learn a language quickly:

  1. Identify basic syntax  (def, indentation, ...)
  2. Identify primary data types  (list, dict, set, tuple, ...)
  3. Identify primary control flow  (if, for, while, ...)
  4. Identify primary modular structures  (functions, classes, modules, ...)
  5. Find and emulate idiomatic examples
  6. Don't be afraid to play until you understand

Example Crash Course

# This is a comment
# Try out the following lines in an interpreter

# This is an assignment of an integer to a variable l
l = 3
type(l)   # returns: int

# This is a reassignment of a float to the same variable l
l = 3.
type(l)   # returns: float

# This is a tuple of Boolean truth values
(True, False)

# This is a string
"Hello world."

# This simultaneously assigns 
# l to a list, m to a tuple, n to a set, and o to a dictionary
l, m, n, o = [1, 2, 3], (1, 2, 3), {1, 3, 3, 2}, {"Alice": True, "Bob": False}

# Basic data structures:
#   list  : ordered, mixed types, duplicates ok, can be changed
#   tuple : ordered, mixed types, duplicates ok, cannot be changed
#   set   : unordered, mixed types, duplicates removed, can be changed
#   dict  : unordered, mixed types, named keys, no duplicate keys, can be changed

# Good for what?
#   list  : sequential traversal, frequent add/drop of elements
#   tuple : sequential traversal, random positional access
#   set   : removing duplicates, unordered iteration
#   dict  : efficient random keyword access 

Test everything out in an IPython session to get a good feel for how the language works: use help(object) or function? or function??

Master these

Example Crash Course

# This is a function definition with two positional arguments
def funFunction(arg1, arg2):
    """This is a docstring used by python's help() function"""
    # The four space indent is required: It indicates the scope of the def
    # Manipulate arguments here. The function evaluates to its "return" value:
    return valueToReturn

# This is a function definition with two keyword arguments
def funFunction2(arg1="default arg1", arg2="default arg2"):
    """docstring"""
    pass # If the function does nothing (dummy function), a pass is needed

# This is a function with a variable number of positional and keyword arguments
def funFunction3(*args, **kwargs):
    """args is a tuple (arg1, arg2, ...) of positional arguments
       kwargs is a dictionary {arg1 : val1, arg2 : val2, ...} of keyword arguments"""
    # Actions that do not return a value are called "side effects"
    # Try to minimize side effects - choose one of:
    #   Functions like print that only perform actions are 'procedures'
    #   Functions that process inputs and return outputs are 'pure functions'
    # Keeping these two separated whenever possible will reduce bugs.
    print(args)
    print(kwargs)

def funFunction4(arg1):
    """Simple example of a closure for partially specifying arguments"""
    def inner(arg2):
        # This is a function restricted to the scope of funFunction4, called a "helper function"
        # Note that it can use arg1 from the enclosing function
        return (arg1 + arg2)
    # Functions are first-class objects, so can be returned as a "value"
    # Here the inner function is returned, but with arg1 fixed. This is called a "closure"
    return inner

Organize logical operations into small functions that do a single task well

Never copy-paste blocks of code in multiple places - this makes it hard to modify later

Keywords

def _():

return

pass

*args

**kwargs

Example Crash Course

# for loops iterate over items in an iterable object
for animal in ["cat", "dog", "emu", "naked mole rat"]:
    # Unlike C, iterations do not use indices
    print( len(animal) )

# many things are iterable:
#   list, tuple, set, dict, generator

# Generators are defined with the 'yield' keyword
def gen_ints():
    """Create a generator for all infinity of positive integers"""
    n = 1
    # A while loop executes its interior until the Boolean test fails
    # By convention, infinite loops use the following construction:
    while True:
        # yield is like return, but pauses the function to return the value
        yield n
        # the function is resumed again here when the next value is needed
        n += 1

g = gen_ints()    # Define a generator object from the above definition
print( next(g) ) # Get the next element (find the next yield call)

# A for loop automatically iterates over elements of a generator
# until it runs out of elements, then it terminates
for i in gen_ints():
    # conditional statements use: if, elif, else
    if i < 10:
        print("Small int", i)
    elif i < 100:
        print("Medium int", i)
    else:
        print("Too big! Aborting.")
        # Without breaking out of the for loop, it would be an infinite loop!
        # The generator as defined above has no termination condition, so it just
        # keeps going forever unless some limit is imposed on it
        break  

Keywords

for _ in _ :

while _ :

yield

if _ :

elif _ :

else:

break

Example Crash Course

# Long way to construct a list with a for loop
l = []
for i in range(10):
    l.append( i**2 )

# List comprehension of the same
l = [i**2 for i in range(10)]

# Set comprehension for analogous set
s = {i**2 for i in range(10)}

# Dict comprehension works similarly
d = {str(i):i**2 for i in range(10)}

# Long way to define a single-use generator
def make_gen():
    for i in range(100):
        for j in range(100):
            yield (i,j)
g = make_gen()

# Generator comprehension of the same
g = ( (i,j) for i in range(100) for j in range(100) )

Keyword:

lambda arg1, arg2 : stuff

# The following are equivalent definitions

def f(n):
    return n+1

f = lambda n : n+1

# This is particularly useful for closures

def powerFactory(n):
    return lambda x : x**n

square = powerFactory(2)
square(4)   # returns 16

Comprehensions are compact and clear ways of defining data structures using existing iterables

 

They are faster than the equivalent for loops for constructing data

Anonymous (single-use) functions can be defined using lambda expressions

(Side note: in Python2 range returns a list and xrange returns a generator. In Python3 range returns a generator, and xrange doesn't exist)

A Note on Style

Coding style is incredibly important

Other coders expect code to look a certain way, so if you code differently, it prevents them from understanding your code easily

Adopting uniformity of style also helps you as a coder to ensure your own code is well-written and well-designed

Python has an established style guide

Read it. Internalize it. Reference it.

Documentation is doubly important

Having well-formatted and informative docstrings will make your code much easier to use for others

See an example from Google below on how to write complete and informative docstrings

Creating Python Modules

Any python file (with extension .py) is a module

Always write python code with the idea that you are extending the language with a custom module

#!/usr/bin/env python

"""Module docstring

Include a description of the use of your module here.
This is used by the python help() command in the interpreter.
"""

# Global variables (try not to use many)
version = 0.1

# Function definitions
def func1(n):
    """Function docstring for use in help()"""
    pass

# Shielded main block (at end of file)
if __name__ == "__main__":
    # Put code in here that only runs
    # when .py file is run from command line
    # This will not be run when the module is imported
    pass

(top line tells linux this could be run as a python script)

(Always document code with docstrings properly - see the help() output for any python module or function for style examples)

(executable code must go in main block, or it will be run on every import statement)

Jupyter

Jupyter notebooks (with extension .ipynb) are interactive cell-based documents for presenting results

~$ jupyter notebook

Launching a notebook server is simple

This opens the notebook directly in your web browser

A demo notebook server is available with tutorials here

If you use Sage Math Cloud or Wakari, Jupyter notebooks are available already via click interface 

Amazingly, the Help menu at the top is actually helpful

Be sure to take the interface tour, and look at keyboard shortcuts (which use vim keys)

Many example notebooks are available for reference

Jupyter Workflow

Rough Workflow:

  1. Write reusable code in Python .py modules
  2. Import modules in a notebook
  3. Write professional report using notebook, calling code from modules as needed to compute results, or display interesting plots
  4. Share final report online with colleagues

Benefits:

  • Code is modular - kept tidy and well-tested separately from the notebook
  • Notebook is organized as a presentation, showing only the relevant parts in a logical order, with good graphical, text, and equation support
  • Notebooks are interactive - easy exploration of data and results
  • Notebooks can be converted into static formats (LaTeX, html, etc.)

Further Reading

Practice makes perfect.

 

Keep references handy until you remember commands on command.