IPython
Overview
Python 2 or Python 3?
- Python 3 fixes many historical issues with Python 2
- Python 2.7 supports as much Python 3 syntax as possible
- Some projects have not yet been updated to 3 (e.g., Sage)
However, most established projects have finally switched- PyReadiness checklist
- Google-App-Engine (as of August 2016)
- Use Python 3 whenever possible - it is the future
For more history and discussion, see here.
Simplest major change from 2 to 3:
Print is now a function:
Instead of a macro:
print("string1" + "string2")
print "string1", "string2"
Recommended Install
When not using a cloud-based solution, like CoCalc, or Wakari, you can install the Anaconda distribution
Anaconda is the leading Data Science distribution of python, and contains a huge number of useful scientific packages out of the box
It is strongly recommended that you also use linux for code development - almost all serious data scientists heavily use UNIX-based servers
If you use MacOS, note that it is also based on UNIX, and even has a bash terminal in its Utilities folder
Many data scientists stick with MacOS for this reason
IPython
Use IPython interpreter for interactive interpreter sessions:
- In a bash terminal run the command "ipython"
(or load a Jupyter notebook with a python kernel) - You should be greeted with the following python prompt (or code cell):
Interactive Python
- Easy to learn, designed for clarity, fast to write, flexible
- Philosophy: code is read more often than written
- Glue: can easily interface with other (C, C++, Julia, ...) code
- Rapid growth in both industry and academia
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
(Side note: if you type '?' for help, the letter 'q' quits the scrolling program "less")
(Warning: Be sure to use "ipython" and not just "python" from a terminal)
IPython
In [1]: 1+2
Out[1]: 3
In [2]: _i1
Out[2]: u'1+2'
In [3]: _1
Out[3]: 3
Commands are stored in a history:
- In [x] / Out[x] : Input and output lines x
- _ix : variable storing input of line x
- _x : variable storing output of line x
- Up/Down arrows cycle through history
- Tab : completes commands (like in bash)
Interactive help is available:
- ? : Open help in less (remember: q to exit)
- %quickref : Quick IPython feature reference
- help(object ) : Open documentation for object
- object ? , object ?? : Quick access to help for object
-
dir() : List all currently defined variables
(Note: the python convention for "hidden" variables is to use underscores __)
Python Modules
- To keep code organized, python uses a module system
- Modules organize names into namespaces
- Modules store functionality beyond the core language
- Modules are loaded with import
In [1]: import this
(Run this code for wisdom)
In [2]: import numpy as np
In [3]: import pandas as pd
In [4]: import matplotlib.pyplot as plt
In [5]: dir()
Out[5]:
[ ...
'np',
'pd',
'plt',
... ]
In [6]: dir(np)
In [7]: help(pd)
Modules are often renamed during import for brevity:
(Efficient numerical arrays)
(Efficient tables for big data)
(Matlab-style plots)
(The imported name appears in the global namespace, with its own private namespace inside)
Python Namespaces
- Python namespaces organize symbols into logical groups
- The period operator extracts symbols from a namespace
In [1]: import this
(Run this code for wisdom again, for good measure)
In [2]: import sys
In [3]: sys.version
Out[3]: '3.5.2 (default, Jun 28 2016, 08:46:01) \n[GCC 6.1.1 20160602]'
In [4]: import os
In [5]: os.system("seq -w -s ' ' 005")
001 002 003 004 005
Out[5]: 0
In [6]: help(os.system)
In [7]: dir(os)
You can always introspect symbols and namespaces:
(the string version lives inside sys)
(the function system lives inside os)
(to see symbols in a namespace, use dir)
<Tab> works to autocomplete symbol names within a namespace
Learning Python
Python is wonderfully easy to learn
It has been called "executable pseudo-code" due to its clean syntax
Besides your textbook, many excellent tutorials exist for free online:
How to learn a language quickly:
- Identify basic syntax (def, indentation, ...)
- Identify primary data types (list, dict, set, tuple, ...)
- Identify primary control flow (if, for, while, ...)
- Identify primary modular structures (functions, classes, modules, ...)
- Find and emulate idiomatic examples
- Don't be afraid to play until you understand
Example Crash Course
# This is a comment
# Try out the following lines in an interpreter
# This is an assignment of an integer to a variable l
l = 3
type(l) # returns: int
# This is a reassignment of a float to the same variable l
l = 3.
type(l) # returns: float
# This is a tuple of Boolean truth values
(True, False)
# This is a string
"Hello world."
# This simultaneously assigns
# l to a list, m to a tuple, n to a set, and o to a dictionary
l, m, n, o = [1, 2, 3], (1, 2, 3), {1, 3, 3, 2}, {"Alice": True, "Bob": False}
# Basic data structures:
# list : ordered, mixed types, duplicates ok, can be changed
# tuple : ordered, mixed types, duplicates ok, cannot be changed
# set : unordered, mixed types, duplicates removed, can be changed
# dict : unordered, mixed types, named keys, no duplicate keys, can be changed
# Good for what?
# list : sequential traversal, frequent add/drop of elements
# tuple : sequential traversal, random positional access
# set : removing duplicates, unordered iteration
# dict : efficient random keyword access
Test everything out in an IPython session to get a good feel for how the language works: use help(object) or function? or function??
Master these
Example Crash Course
# This is a function definition with two positional arguments
def funFunction(arg1, arg2):
"""This is a docstring used by python's help() function"""
# The four space indent is required: It indicates the scope of the def
# Manipulate arguments here. The function evaluates to its "return" value:
return valueToReturn
# This is a function definition with two keyword arguments
def funFunction2(arg1="default arg1", arg2="default arg2"):
"""docstring"""
pass # If the function does nothing (dummy function), a pass is needed
# This is a function with a variable number of positional and keyword arguments
def funFunction3(*args, **kwargs):
"""args is a tuple (arg1, arg2, ...) of positional arguments
kwargs is a dictionary {arg1 : val1, arg2 : val2, ...} of keyword arguments"""
# Actions that do not return a value are called "side effects"
# Try to minimize side effects - choose one of:
# Functions like print that only perform actions are 'procedures'
# Functions that process inputs and return outputs are 'pure functions'
# Keeping these two separated whenever possible will reduce bugs.
print(args)
print(kwargs)
def funFunction4(arg1):
"""Simple example of a closure for partially specifying arguments"""
def inner(arg2):
# This is a function restricted to the scope of funFunction4, called a "helper function"
# Note that it can use arg1 from the enclosing function
return (arg1 + arg2)
# Functions are first-class objects, so can be returned as a "value"
# Here the inner function is returned, but with arg1 fixed. This is called a "closure"
return inner
Organize logical operations into small functions that do a single task well
Never copy-paste blocks of code in multiple places - this makes it hard to modify later
Keywords
def _():
return
pass
*args
**kwargs
Example Crash Course
# for loops iterate over items in an iterable object
for animal in ["cat", "dog", "emu", "naked mole rat"]:
# Unlike C, iterations do not use indices
print( len(animal) )
# many things are iterable:
# list, tuple, set, dict, generator
# Generators are defined with the 'yield' keyword
def gen_ints():
"""Create a generator for all infinity of positive integers"""
n = 1
# A while loop executes its interior until the Boolean test fails
# By convention, infinite loops use the following construction:
while True:
# yield is like return, but pauses the function to return the value
yield n
# the function is resumed again here when the next value is needed
n += 1
g = gen_ints() # Define a generator object from the above definition
print( next(g) ) # Get the next element (find the next yield call)
# A for loop automatically iterates over elements of a generator
# until it runs out of elements, then it terminates
for i in gen_ints():
# conditional statements use: if, elif, else
if i < 10:
print("Small int", i)
elif i < 100:
print("Medium int", i)
else:
print("Too big! Aborting.")
# Without breaking out of the for loop, it would be an infinite loop!
# The generator as defined above has no termination condition, so it just
# keeps going forever unless some limit is imposed on it
break
Keywords
for _ in _ :
while _ :
yield
if _ :
elif _ :
else:
break
Example Crash Course
# Long way to construct a list with a for loop
l = []
for i in range(10):
l.append( i**2 )
# List comprehension of the same
l = [i**2 for i in range(10)]
# Set comprehension for analogous set
s = {i**2 for i in range(10)}
# Dict comprehension works similarly
d = {str(i):i**2 for i in range(10)}
# Long way to define a single-use generator
def make_gen():
for i in range(100):
for j in range(100):
yield (i,j)
g = make_gen()
# Generator comprehension of the same
g = ( (i,j) for i in range(100) for j in range(100) )
Keyword:
lambda arg1, arg2 : stuff
# The following are equivalent definitions
def f(n):
return n+1
f = lambda n : n+1
# This is particularly useful for closures
def powerFactory(n):
return lambda x : x**n
square = powerFactory(2)
square(4) # returns 16
Comprehensions are compact and clear ways of defining data structures using existing iterables
They are faster than the equivalent for loops for constructing data
We will have much more to say about making python efficient later on
Anonymous (single-use) functions can be defined using lambda expressions
A Note on Style
Coding style is incredibly important
Other coders expect code to look a certain way, so if you code differently, it prevents them from understanding your code easily
Adopting uniformity of style also helps you as a coder to ensure your own code is well-written and well-designed
Python has an established style guide
Read it. Internalize it. Reference it.
Documentation is doubly important
Having well-formatted and informative docstrings will make your code much easier to use for others
See an example from Google below on how to write complete and informative docstrings
Creating Python Modules
Any python file (with extension .py) is a module
Always write python code with the idea that you are extending the language with a custom module
#!/usr/bin/env python
"""Module docstring
Include a description of the use of your module here.
This is used by the python help() command in the interpreter.
"""
# Global variables (try not to use many)
version = 0.1
# Function definitions
def func1(n):
"""Function docstring for use in help()"""
pass
# Shielded main block (at end of file)
if __name__ == "__main__":
# Put code in here that only runs
# when .py file is run from command line
# This will not be run when the module is imported
pass
(top line tells linux this could be run as a python script)
(Always document code with docstrings properly - see the help() output for any python module or function for style examples)
(executable code must go in main block, or it will be run on every import statement)
Further Reading
Practice makes perfect.
Keep references handy until you remember commands on command.
IPython Overview
By Justin Dressel
IPython Overview
These slides offer a condensed description of the essentials of using IPython for professional interactive reports.
- 4,735