Python for data science
Python for data science
Jeremy Jacobson
jeremyallenjacobson@github.io
Visiting Assistant Professor
Institute for Quantitative Theory and Methods
We are the Knights who say ni!

Windows
Search for cmd
Mac OS X
The Mac command line is a program called Terminal. It lives in the folder
Linux
You already know the answer
Where is the command line?
/Applications/Utilities/
The letter may be different...so you might see something like:
Microsoft Windows [Version 10.0.10586]
(c) 2015 Microsoft Corporation. All rights reserved.
C:\Users\jajaco3>
D:\YourName\Projects\Python>From now on we will denote the command line by a dollar sign.
$Go to the command line
Windows
dir cd cd.. cd\ mkdir rmdir
Mac OS X
ls cd cd .. cd\ mkdir rm
Linux
ls cd cd .. cd\ mkdir rm
Command syntax
usage: conda-script.py [-h] [-V] [--debug] command ...
conda is a tool for managing and deploying applications, environments and packages.
Options:
positional arguments:
command
info Display information about current conda install.$ conda$ conda info
Current conda install:
platform : win-64
conda version : 4.1.6
conda-env version : 2.5.1
conda-build version : 1.21.3
python version : 2.7.12.final.0
requests version : 2.10.0
root environment : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2 (writable)
default environment : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2
envs directories : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs
package cache : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\pkgs
channel URLs : https://repo.continuum.io/pkgs/free/win-64/
https://repo.continuum.io/pkgs/free/noarch/
https://repo.continuum.io/pkgs/pro/win-64/
https://repo.continuum.io/pkgs/pro/noarch/
config file : None
offline mode : False
is foreign system : False
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs\reproduceEnv:
The following NEW packages will be INSTALLED:
backports: 1.0-py27_0
backports_abc: 0.4-py27_0
bokeh: 0.12.3-py27_1
futures: 3.0.5-py27_0
jinja2: 2.8-py27_1
markupsafe: 0.23-py27_2
mkl: 11.3.3-1
numpy: 1.11.2-py27_0
pip: 8.1.2-py27_0
python: 2.7.12-0
python-dateutil: 2.5.3-py27_0
pyyaml: 3.12-py27_0
requests: 2.11.1-py27_0
setuptools: 27.2.0-py27_1
singledispatch: 3.4.0.3-py27_0
six: 1.10.0-py27_0
ssl_match_hostname: 3.4.0.2-py27_1
tornado: 4.4.2-py27_0
vs2008_runtime: 9.00.30729.1-2
wheel: 0.29.0-py27_0
Proceed ([y]/n)?What if I need to reproduce someone elses calculation using specific versions of various packages?
$ conda create --name reproduceEnv python=2.7 bokeh=0.12.3
Fetching packages ...
numpy-1.11.2-p 100% |###############################| Time: 0:00:00 7.95 MB/s
pyyaml-3.12-py 100% |###############################| Time: 0:00:00 1.10 MB/s
requests-2.11. 100% |###############################| Time: 0:00:00 3.06 MB/s
setuptools-27. 100% |###############################| Time: 0:00:00 3.83 MB/s
tornado-4.4.2- 100% |###############################| Time: 0:00:00 2.69 MB/s
bokeh-0.12.3-p 100% |###############################| Time: 0:00:00 5.53 MB/s
Extracting packages ...
[ COMPLETE ]|##################################################| 100%
Linking packages ...
[ COMPLETE ]|##################################################| 100%
#
# To activate this environment, use:
# > activate reproduceEnv
#
# To deactivate this environment, use:
# > deactivate
#
$If you were to hit 'y' you would see this:
To verify that the 'reproduceEnv' environment has now been added, type the command:
$ conda info --envs# conda environments:
#
reproduceEnv C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs\reproduceEnv
root * C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2Activate this environment with:
C:\>activate reproduceEnv
(reproduceEnv) C:\>Verify the correct versions with:
(reproduceEnv) C:\>python --version
Python 2.7.12 :: Continuum Analytics, Inc.
(reproduceEnv) C:\>bokeh --version
0.12.3
(reproduceEnv) C:\>$ source activate reproduceEnvMAC/LINUX:
WINDOWS:
To return to Anaconda installed defaults, deactivate this environment with the command:
(reproduceEnv) C:\>deactivate reproduceEnv
C:\>Verify the Anaconda installed default version with:
C:\>python --version
Python 2.7.12 :: Anaconda 4.1.1 (64-bit)
C:\>bokeh --version
0.12.0
C:\>$ source deactivate reproduceEnvMAC/LINUX:
WINDOWS:
To list packages available use:
# packages in environment at C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2:
#
_nb_ext_conf 0.2.0 py27_0
alabaster 0.7.8 py27_0
anaconda 4.1.1 np111py27_0
anaconda-client 1.4.0 py27_0
anaconda-navigator 1.2.1 py27_0
argcomplete 1.0.0 py27_1
astropy 1.2.1 np111py27_0
babel 2.3.3 py27_0
backports 1.0 py27_0
backports_abc 0.4 py27_0
beautifulsoup4 4.4.1 py27_0
bitarray 0.8.1 py27_1
blaze 0.10.1 py27_0
bokeh 0.12.0 py27_0
boto 2.40.0 py27_0
bottleneck 1.1.0 np111py27_0
bzip2 1.0.6 vc9_3 [vc9]
cdecimal 2.3 py27_2
cffi 1.6.0 py27_0
chest 0.2.3 py27_0
click 6.6 py27_0
cloudpickle 0.2.1 py27_0
clyent 1.2.2 py27_0
colorama 0.3.7 py27_0
comtypes 1.1.2 py27_0
conda 4.1.6 py27_0
conda-build 1.21.3 py27_0$ conda listWhat is Python?
- Type commands individually into interpreter
- Create a script file (myPythonFile.py).

What is Python?

$ python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>If you want to type commands individually, start the Python interactive shell by typing python.
Python Interactive Shell

$ python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>Interactive Mode:
Python as desktop calculator
The integer numbers (e.g. 2, 4, 20) have type:
>>> 2+2
4
>>> 53/3
17
>>> 53/4
13int
The decimals (e.g. 53.0, 4.0, 13.25) have type:
float
If both operands are of type int, floor division is performed and an int is returned (e.g. 53/4 returns 13)
If either operand is a float, classic division is performed and a float is returned (e.g. 53.0/4 or 53/4.0 return 13.25)
>>> 53/4.0
13.25
>>> 53.0/4
13.25Python as a desktop calculator
>>> type(53/3)
<type 'int'>
>>> type(53)
<type 'int'>
>>> type(53.0)
<type 'float'>>>> 53 / 4 # int / int -> int
13
>>> 53 / 4.0 # int / float -> float
13.25
>>> 53 // 4
13
>>> 53 // 4.0
13.0
>>> 53 // 4.0 # explicit floor division discards the remainder
13.0
>>> 53 % 4 # the % operator returns the remainder
1
>>> 13 * 4 + 1 # result * divisor + remainder
53# comment
Python as a desktop calculator
>>> 9 ** 1
9
>>> 9 ** 2
81
>>> 9 ** 3
729
>>> 9 ** 4
6561
>>> 9 ** 5
59049
With Python, use the ** operator to calculate powers
You can also use pow( , )
>>> pow(2,3)
8
>>> pow(9,1)
9
>>> pow(9,2)
81
>>> pow(9,3)
729
>>>Python as a desktop calculator
>>> width = 23.56
>>> height = 46.9
>>>
The equal sign (=) is used to assign a value to a variable.
Afterwards, no result is displayed before the next cell.
If a variable is not “defined” (assigned a value), trying to use it will give you an error:
>>> heitgh * width
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'heitgh' is not defined
>>> height * width
1104.964In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:
>>> width = 23.5482934
>>> height = 46.923
>>> height * width
1104.9565712081999
>>> _
1104.9565712081999
>>> round(_,4)
1104.9566
>>> round(_,2)
1104.96
>>>Python as a desktop calculator: math module
>>> factorial(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'factorial' is not defined
>>> math.factorial(10)
3628800
>>> math.factorial(2)*math.factorial(8)
80640
>>> math.factorial(10)/_
45Python as a desktop calculator: math module
>>> cos(pi)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'cos' is not defined
>>> math.cos(math.pi)
-1.0Python as a desktop calculator: math module
>>> from math import cos, pi, factorial
>>> factorial(10)
3628800
>>> cos(pi)
-1.0Python as a desktop calculator: math module
>>> from math import *>>> factorial(10)
3628800
>>> cos(pi)
-1.0
>>> exp(1)
2.718281828459045>>> exp(1)
2.718281828459045>>> format(exp(1), '.50g')
'2.7182818284590450907955982984276488423347473144531'
>>> format(exp(1), '.51g')
'2.71828182845904509079559829842764884233474731445312'
>>> format(exp(1), '.52g')
'2.718281828459045090795598298427648842334747314453125'
>>> format(exp(1), '.53g')
'2.718281828459045090795598298427648842334747314453125'Python as a desktop calculator: fractions module
>>> from fractions import Fraction
>>> Fraction(1,6)
Fraction(1, 6)
>>> _
Fraction(1, 6)
>>> float(Fraction(1,6))
0.16666666666666666
>>> Fraction(1,6)+Fraction(1,6)
Fraction(1, 3)
>>> Fraction(1,6)*Fraction(1,6)
Fraction(1, 36)
>>> Fraction(1,6)**2
Fraction(1, 36)
>>> pow(Fraction(1,6), 2)
0.027777777777777776
Python as a desktop calculator: fractions module
>>> Fraction(35,19238)*Fraction(12,12384)
Fraction(35, 19853616)
>>>>>> float(_)
1.7629030399298546e-06
>>>>>> Fraction(1,6)**15
Fraction(1, 470184984576)
>>> float(_)
2.1268224907304786e-12Defining functions
Functions Defined
The core of extensible programming is defining functions.
# Python 3: Fibonacci series up to n
>>> def fib(n):
>>> a, b = 0, 1
>>> while a < n:
>>> print(a, end=' ')
>>> a, b = b, a+b
>>> print()
>>> fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Defining functions
3 types of Functions
- functions always available for usage
- functions contained within external modules which must be imported
- functions defined by a programmer with the def keyword (see example on left).
# Python 3: Fibonacci series up to n
>>> def fib(n):
>>> a, b = 0, 1
>>> while a < n:
>>> print(a, end=' ')
>>> a, b = b, a+b
>>> print()
>>> fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Defining functions:
Le Chevalier de Méré


Defining functions:
Le Chevalier de Méré






Defining functions
>>> def chevalier(n):
... return Fraction(1,1)-Fraction(35,36)**n
...
>>> 
Defining functions
>>> chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> 
Defining functions
>>> chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> float(_)
0.49140387613090325
>>>
>>> chevalier(25)
Fraction(408611683992293747092011689842522621501, 808281277464764060643139600456536293376)
>>> float(_)
0.5055315462383781
>>>Defining functions

>>> exit()
C:\Users\jajaco3>python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Defining functions:
exit()

>>> chevalier(25)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'chevalier' is not defined
>>>"Non défini!"


Writing Python code
If you want to reuse your function, you must write it down:

- In a .py file using a text editor, e.g. sublime text
- In a Ipython/Jupyter notebook
Writing Python code:
Sublime text



Writing Python code:
Jupyter Notebooks

Modules
- A module is a file containing Python definitions and statements.

# Probability problem of Chevalier module
from fractions import Fraction
def chevalier(x):
return Fraction(1,1)-Fraction(35,36)**x
def fchevalier(x):
return float(Fraction(1,1)-Fraction(35,36)**x)- The file name is the module name with the suffix .py appended.
- For instance, use your favorite text editor to create a file called mymodule.py in the current directory with the following contents:
Modules

Now enter the Python interpreter and import this module with the following command:
C:\Users\jajaco3\Desktop\PythonForDataScience>python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import mymoduleModules

Using the module name you can access the functions:
>>> mymodule.chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> mymodule.fchevalier(24)
0.49140387613090325
>>>Modules

If you intend to use a function often you can assign it to a local name:
>>> chev = mymodule.chevalier
>>> chev(25)
Fraction(408611683992293747092011689842522621501, 808281277464764060643139600456536293376)
>>> fchev = mymodule.fchevalier
>>> fchev(25)
0.5055315462383781
>>>
The built-in function dir() is used to find out which names a module defines.
It returns a sorted list of strings:
Modules
>>> import mymodule
>>> dir(mymodule)
['Fraction', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'chevalier', 'fchevalier']
>>>
- The built-in function help() provides helpful information from "docstrings" written into the function.
Modules
>>> import math
>>> help(math.log)
Help on built-in function log in module math:
log(...)
log(x[, base])
Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
>>>- However, not all functions have docstrings (e.g. we did not write one for our function) but the vast majority are documented this way.

Modules:
packages
- Packages are a way of structuring Python’s module namespace by using “dotted module names”.
- For example, the module name A.B designates a submodule named B in a package named A.
- The use of dotted module names saves the authors of multi-module packages like NumPy from having to worry about each other’s module names.
Modules:
Executing modules as script

- The code that parses the command line only runs if the module is executed as the “main” file
if __name__ == "__main__": import sys fchevalier(int(sys.argv[1]))

Python for data science:
Day 2
cartoon from xkcd

Python Packages






NumPy
Matplotlib
Pandas
SciPy
SymPy
NumPy:
import numpy
import numpy as np
>>> import numpy
>>> print dir(numpy)
['ALLOW_THREADS', 'BUFSIZE', 'CLIP', 'ComplexWarning', 'DataSource', 'ERR_CALL', 'ERR_DEFAULT', 'ERR_IGNORE', 'ERR_LOG', 'ERR_PRINT', 'ERR_RAISE', 'ERR_WARN', 'FLOATING_POINT_SUPPORT', 'FPE_DIVIDEBYZERO', 'FPE_INVALID', 'FPE_OVERFLOW', 'FPE_UNDERFLOW', 'False_', 'Inf', 'Infinity', 'MAXDIMS', 'MAY_SHARE_BOUNDS', 'MAY_SHARE_EXACT', 'MachAr', 'ModuleDeprecationWarning', 'NAN', 'NINF', 'NZERO', 'NaN', 'PINF', 'PZERO', 'PackageLoader', 'RAISE', 'RankWarning', 'SHIFT_DIVIDEBYZERO', 'SHIFT_INVALID', 'SHIFT_OVERFLOW', 'SHIFT_UNDERFLOW', 'ScalarType', 'Tester', 'TooHardError', 'True_', 'UFUNC_BUFSIZE_DEFAULT', 'UFUNC_PYVALS_NAME', 'VisibleDeprecationWarning', 'WRAP', '_NoValue', '__NUMPY_SETUP__', '__all__', '__builtins__', '__config__', '__doc__', '__file__', '__git_revision__', '__mkl_version__', '__name__', '__package__', '__path__', '__version__', '_import_tools', '_mat', 'abs', 'absolute', 'absolute_import', 'add', 'add_docstring', 'add_newdoc', 'add_newdoc_ufunc', 'add_newdocs', 'alen', 'all', 'allclose', 'alltrue', 'alterdot', 'amax', 'amin', 'angle', 'any', 'append', 'apply_along_axis', 'apply_over_axes', 'arange', 'arccos', 'arccosh', 'arcsin', 'arcsinh', 'arctan', 'arctan2', 'arctanh', 'argmax', 'argmin', 'argpartition', 'argsort', 'argwhere', 'around', 'array', 'array2string', 'array_equal', 'array_equiv', 'array_repr', 'array_split', 'array_str', 'asanyarray', 'asarray', 'asarray_chkfinite', 'ascontiguousarray', 'asfarray', 'asfortranarray', 'asmatrix', 'asscalar', 'atleast_1d', 'atleast_2d', 'atleast_3d', 'average', 'bartlett', 'base_repr', 'bench', 'binary_repr', 'bincount', 'bitwise_and', 'bitwise_not', 'bitwise_or', 'bitwise_xor', 'blackman', 'bmat', 'bool', 'bool8', 'bool_', 'broadcast', 'broadcast_arrays', 'broadcast_to', 'busday_count', 'busday_offset', 'busdaycalendar', 'byte', 'byte_bounds', 'bytes_', 'c_', 'can_cast', 'cast', 'cbrt', 'cdouble', 'ceil', 'cfloat', 'char', 'character', 'chararray', 'choose', 'clip', 'clongdouble', 'clongfloat', 'column_stack', 'common_type', 'compare_chararrays', 'compat', 'complex', 'complex128', 'complex64', 'complex_', 'complexfloating', 'compress', 'concatenate', 'conj', 'conjugate', 'convolve', 'copy', 'copysign', 'copyto', 'core', 'corrcoef', 'correlate', 'cos', 'cosh', 'count_nonzero', 'cov', 'cross', 'csingle', 'ctypeslib', 'cumprod', 'cumproduct', 'cumsum', 'datetime64', 'datetime_as_string', 'datetime_data', 'deg2rad', 'degrees', 'delete', 'deprecate', 'deprecate_with_doc', 'diag', 'diag_indices', 'diag_indices_from', 'diagflat', 'diagonal', 'diff', 'digitize', 'disp', 'divide', 'division', 'dot', 'double', 'dsplit', 'dstack', 'dtype', 'e', 'ediff1d', 'einsum', 'emath', 'empty', 'empty_like', 'equal', 'errstate', 'euler_gamma', 'exp', 'exp2', 'expand_dims', 'expm1', 'extract', 'eye', 'fabs', 'fastCopyAndTranspose', 'fft', 'fill_diagonal', 'find_common_type', 'finfo', 'fix', 'flatiter', 'flatnonzero', 'flexible', 'fliplr', 'flipud', 'float', 'float16', 'float32', 'float64', 'float_', 'floating', 'floor', 'floor_divide', 'fmax', 'fmin', 'fmod', 'format_parser', 'frexp', 'frombuffer', 'fromfile', 'fromfunction', 'fromiter', 'frompyfunc', 'fromregex', 'fromstring', 'full', 'full_like', 'fv', 'generic', 'genfromtxt', 'get_array_wrap', 'get_include', 'get_printoptions', 'getbuffer', 'getbufsize', 'geterr', 'geterrcall', 'geterrobj', 'gradient', 'greater', 'greater_equal', 'half', 'hamming', 'hanning', 'histogram', 'histogram2d', 'histogramdd', 'hsplit', 'hstack', 'hypot', 'i0', 'identity', 'iinfo', 'imag', 'in1d', 'index_exp', 'indices', 'inexact', 'inf', 'info', 'infty', 'inner', 'insert', 'int', 'int0', 'int16', 'int32', 'int64', 'int8', 'int_', 'int_asbuffer', 'intc', 'integer', 'interp', 'intersect1d', 'intp', 'invert', 'ipmt', 'irr', 'is_busday', 'isclose', 'iscomplex', 'iscomplexobj', 'isfinite', 'isfortran', 'isinf', 'isnan', 'isneginf', 'isposinf', 'isreal', 'isrealobj', 'isscalar', 'issctype', 'issubclass_', 'issubdtype', 'issubsctype', 'iterable', 'ix_', 'kaiser', 'kron', 'ldexp', 'left_shift', 'less', 'less_equal', 'lexsort', 'lib', 'linalg', 'linspace', 'little_endian', 'load', 'loads', 'loadtxt', 'log', 'log10', 'log1p', 'log2', 'logaddexp', 'logaddexp2', 'logical_and', 'logical_not', 'logical_or', 'logical_xor', 'logspace', 'long', 'longcomplex', 'longdouble', 'longfloat', 'longlong', 'lookfor', 'ma', 'mafromtxt', 'mask_indices', 'mat', 'math', 'matmul', 'matrix', 'matrixlib', 'max', 'maximum', 'maximum_sctype', 'may_share_memory', 'mean', 'median', 'memmap', 'meshgrid', 'mgrid', 'min', 'min_scalar_type', 'minimum', 'mintypecode', 'mirr', 'mod', 'modf', 'moveaxis', 'msort', 'multiply', 'nan', 'nan_to_num', 'nanargmax', 'nanargmin', 'nanmax', 'nanmean', 'nanmedian', 'nanmin', 'nanpercentile', 'nanprod', 'nanstd', 'nansum', 'nanvar', 'nbytes', 'ndarray', 'ndenumerate', 'ndfromtxt', 'ndim', 'ndindex', 'nditer', 'negative', 'nested_iters', 'newaxis', 'newbuffer', 'nextafter', 'nonzero', 'not_equal', 'nper', 'npv', 'numarray', 'number', 'obj2sctype', 'object', 'object0', 'object_', 'ogrid', 'oldnumeric', 'ones', 'ones_like', 'outer', 'packbits', 'pad', 'partition', 'percentile', 'pi', 'piecewise', 'pkgload', 'place', 'pmt', 'poly', 'poly1d', 'polyadd', 'polyder', 'polydiv', 'polyfit', 'polyint', 'polymul', 'polynomial', 'polysub', 'polyval', 'power', 'ppmt', 'print_function', 'prod', 'product', 'promote_types', 'ptp', 'put', 'putmask', 'pv', 'r_', 'rad2deg', 'radians', 'random', 'rank', 'rate', 'ravel', 'ravel_multi_index', 'real', 'real_if_close', 'rec', 'recarray', 'recfromcsv', 'recfromtxt', 'reciprocal', 'record', 'remainder', 'repeat', 'require', 'reshape', 'resize', 'restoredot', 'result_type', 'right_shift', 'rint', 'roll', 'rollaxis', 'roots', 'rot90', 'round', 'round_', 'row_stack', 's_', 'safe_eval', 'save', 'savetxt', 'savez', 'savez_compressed', 'sctype2char', 'sctypeDict', 'sctypeNA', 'sctypes', 'searchsorted', 'select', 'set_numeric_ops', 'set_printoptions', 'set_string_function', 'setbufsize', 'setdiff1d', 'seterr', 'seterrcall', 'seterrobj', 'setxor1d', 'shape', 'shares_memory', 'short', 'show_config', 'sign', 'signbit', 'signedinteger', 'sin', 'sinc', 'single', 'singlecomplex', 'sinh', 'size', 'sometrue', 'sort', 'sort_complex', 'source', 'spacing', 'split', 'sqrt', 'square', 'squeeze', 'stack', 'std', 'str', 'str_', 'string0', 'string_', 'subtract', 'sum', 'swapaxes', 'sys', 'take', 'tan', 'tanh', 'tensordot', 'test', 'testing', 'tile', 'timedelta64', 'trace', 'transpose', 'trapz', 'tri', 'tril', 'tril_indices', 'tril_indices_from', 'trim_zeros', 'triu', 'triu_indices', 'triu_indices_from', 'true_divide', 'trunc', 'typeDict', 'typeNA', 'typecodes', 'typename', 'ubyte', 'ufunc', 'uint', 'uint0', 'uint16', 'uint32', 'uint64', 'uint8', 'uintc', 'uintp', 'ulonglong', 'unicode', 'unicode0', 'unicode_', 'union1d', 'unique', 'unpackbits', 'unravel_index', 'unsignedinteger', 'unwrap', 'ushort', 'vander', 'var', 'vdot', 'vectorize', 'version', 'void', 'void0', 'vsplit', 'vstack', 'warnings', 'where', 'who', 'zeros', 'zeros_like']>>> import numpy as np
NumPy
numpy.linalg.norm:
Length of a vector
>>> np.linalg.norm([2,4])
4.4721359549995796
numpy.linalg.inv:
Inverse of the Matrix
>>> np.linalg.inv([[2,0],[0,2]])
array([[ 0.5, 0. ],
[ 0. , 0.5]])
NumPy
>>> M = np.matrix([[2,0],[0,2]])
>>> M * M
matrix([[4, 0],
[0, 4]])
>>> M.I
matrix([[ 0.5, 0. ],
[ 0. , 0.5]])
>>> _*M
matrix([[ 1., 0.],
[ 0., 1.]])
NumPy
numpy.matrix:
alternative to numpy.array

NumPy:
numpy.random:
Random sampling
| rand(d0, d1, ..., dn) | Random values in a given shape. |
| randn(d0, d1, ..., dn) | Return a sample (or samples) from the “standard normal” distribution. |
| randint(low[, high, size, dtype]) | Return random integers from low (inclusive) to high (exclusive). |
| random_integers(low[, high, size]) | Random integers of type np.int between low and high, inclusive. |
| random_sample([size]) | Return random floats in the half-open interval [0.0, 1.0). |
| random([size]) | Return random floats in the half-open interval [0.0, 1.0). |
| ranf([size]) | Return random floats in the half-open interval [0.0, 1.0). |
| sample([size]) | Return random floats in the half-open interval [0.0, 1.0). |
| choice(a[, size, replace, p]) | Generates a random sample from a given 1-D array |
| bytes(length) | Return random bytes. |
Permutations
| shuffle(x) | Modify a sequence in-place by shuffling its contents. |
| permutation(x) | Randomly permute a sequence, or return a permuted range. |
>>> np.random.rand(1)
array([ 0.88288734])
>>> np.random.rand(2)
array([ 0.49363292, 0.16510102])
>>> np.random.rand(1,2)
array([[ 0.91578292, 0.60660245]])
>>> np.random.rand(2,1)
array([[ 0.23357355],
[ 0.25666865]])
>>> np.random.rand(2,2, 2)
array([[[ 0.00763289, 0.09970221],
[ 0.68506231, 0.31527997]],
[[ 0.94116485, 0.43140191],
[ 0.78844403, 0.2575242 ]]])
>>> np.random.rand(2,2, 2, 2)
array([[[[ 0.57742619, 0.99551532],
[ 0.99774305, 0.45852133]],
[[ 0.06396083, 0.68351763],
[ 0.96607077, 0.27239997]]],
[[[ 0.91702878, 0.33364644],
[ 0.34576949, 0.06507407]],
[[ 0.20052982, 0.24095151],
[ 0.1145685 , 0.28035093]]]])
>>>
>>> np.random.permutation(6)
array([5, 1, 4, 0, 3, 2])
>>> sequence = np.random.permutation(10)
>>> print(sequence)
[1 2 5 9 8 3 4 6 0 7]
NumPy:
numpy.random.permutation()
>>> np.random.uniform(-1,1, size=10)
array([-0.08, 0.74, -0.64, -0.98, 0.98, -0.21, -0.36, -0.34, -0.16, -0.86])

NumPy:
numpy.random.uniform()
>>> np.random.uniform(-1,1, size=10000)
array([ 0.34, -0.88, -0.64, ..., -0.87, 0.09, 0.42])
>>> stats.describe(_)
DescribeResult(nobs=10000L, minmax=(-0.99984799016922654, 0.99997505667204889),
mean=-0.0017052074604612564, variance=0.33296037891169855, skewness=-0.004341077496136224,
kurtosis=-1.2054297738246007)
>>>
NumPy:
numpy.random.uniform()
scipy.stats
Statistical functions
This module contains a large number of probability distributions as well as a growing library of statistical functions.

scipy.stats.describe
>>> np.random.permutation(6)
array([5, 1, 4, 0, 3, 2])
>>> sequence = np.random.permutation(10)
>>> print(sequence)
[1 2 5 9 8 3 4 6 0 7]
>>> stats.describe(sequence)
DescribeResult(nobs=10, minmax=(0, 9), mean=4.5, variance=9.1666666666666661,
skewness=0.0, kurtosis=-1.2242424242424244)

scipy.misc.comb
>>> from scipy.special import comb
>>> comb(10,2)
45.0
>>> comb(3,2)
3.0
>>>

For many more stat related functions install the software R and the interface package rpy.
Goal: a billion points, meaningfully, interactively, in the browser

Bokeh

MatplotLib


MatplotLib
matplotlib.style.use('fivethirtyeight')Without this change, most styles will default to the "jet" colormap.


MatplotLib
$ git clone https://github.com/tonysyu/matplotlib-style-gallery.git
$ cd matplotlib-style-gallery
$ python -m mpl_style_galleryTo run the gallery showing styles available, simply grab the source and run the package as a script:


"a 2 billion dollar
facebook for programmers"


"15 billion active users who learn, share, and work together to build software"
-
Github: file repository like dropbox
-
Git: version control system for software dev.
"free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency."

$ git clone$ git pushClone a repository
Update server with your commits across all branches that are common between your local copy and server
$ git forkPoints your repo to original via an alternates file
$ git pullFetch changes from server and merge into current branch
Jupyter Notebooks
"Open source, interactive data science and scientific computing across over 40 programming languages."

Notebooks may be exported to a range of static formats:
- HTML (for example, for blog posts)
- LaTeX
- Slide shows (via reveal.js)

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
Any .ipynb notebook document available from a public URL can be shared via the Jupyter Notebook Viewer (nbviewer).

Jupyter Notebooks
The landing page of the Jupyter notebook web application, the dashboard, shows the notebooks currently available in the notebook directory

Jupyter Notebooks


Jupyter Notebooks
Ok, I want to download a notebook and start working with it. What next?
Cloning notebooks from Github


-
Use browser (Github.com)
-
Use Git


https://github.com/neuroneuro15


https://github.com/neuroneuro15/SciPyCourse2016
C:\Users\jajaco3\Documents>git clone https://github.com/neuroneuro15/SciPyCourse2016
Cloning into 'SciPyCourse2016'...
remote: Counting objects: 248, done.
Receiving objects: 100% (248/248), 7.16 MiB | 0 bytes/s, done.48Receiving objects: 98% (244/248)
Resolving deltas: 100% (126/126), done.
Checking connectivity... done.
git-lfs smudge -- 'Homework Sample Data/Homework 4/eeg_data.mat': git-lfs: command not found
error: external filter git-lfs smudge -- %f failed -1
error: external filter git-lfs smudge -- %f failed
fatal: Homework Sample Data/Homework 4/eeg_data.mat: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'
C:\Users\jajaco3\Documents>

C:\Users\jajaco3\Documents>dir
Volume in drive C is Windows
Volume Serial Number is E88B-1894
Directory of C:\Users\jajaco3\Documents
11/03/2016 04:52 PM <DIR> .
11/03/2016 04:52 PM <DIR> ..
11/01/2016 05:10 PM <DIR> .ipynb_checkpoints
10/05/2016 03:16 PM 2,873 .Rhistory
09/11/2016 07:13 PM <DIR> btabibian.github.io
09/11/2016 07:16 PM <DIR> btabibian.github.io-master
08/17/2016 03:46 PM <DIR> Custom Office Templates
11/01/2016 09:25 PM <DIR> GitHub
09/27/2016 11:17 AM <DIR> LabData
09/11/2016 08:31 PM 33,070 Lecture1-Copy1.ipynb
11/01/2016 11:29 AM 10,731 Lecture1.ipynb
08/18/2016 03:03 PM <DIR> Python Scripts
11/01/2016 05:41 PM <DIR> pythonexamplefolder
08/22/2016 09:21 AM <DIR> R
11/03/2016 04:37 PM <DIR> SciPyCourse2016-master
09/02/2016 09:02 PM <DIR> Turning
11/03/2016 10:03 AM <DIR> TurningPoint 5
11/01/2016 05:10 PM 2,699 Untitled1.ipynb
4 File(s) 49,373 bytes
14 Dir(s) 196,888,305,664 bytes freeFind the directory containing SciPyCourse2016-master

C:\Users\jajaco3\Documents>jupyter notebook
[W 16:55:15.381 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 16:55:17.400 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 16:55:18.250 NotebookApp] Γ£ô nbpresent HTML export ENABLED
[W 16:55:18.253 NotebookApp] Γ£ù nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 16:55:18.263 NotebookApp] [nb_conda] enabled
[I 16:55:18.414 NotebookApp] [nb_anacondacloud] enabled
[I 16:55:18.767 NotebookApp] Serving notebooks from local directory: C:\Users\jajaco3\Documents
[I 16:55:18.770 NotebookApp] 0 active kernels
[I 16:55:18.772 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 16:55:18.776 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Open Jupyter notebook

C:\Users\jajaco3\Documents>jupyter notebook
[W 16:55:15.381 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 16:55:17.400 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 16:55:18.250 NotebookApp] Γ£ô nbpresent HTML export ENABLED
[W 16:55:18.253 NotebookApp] Γ£ù nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 16:55:18.263 NotebookApp] [nb_conda] enabled
[I 16:55:18.414 NotebookApp] [nb_anacondacloud] enabled
[I 16:55:18.767 NotebookApp] Serving notebooks from local directory: C:\Users\jajaco3\Documents
[I 16:55:18.770 NotebookApp] 0 active kernels
[I 16:55:18.772 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 16:55:18.776 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Open Jupyter notebook
Open any .ipynb file and you are on your way to learning python using Jupyter notebooks and Github!


github.com/jeremyallenjacobson/PythonForDataScience
To obtain these slides as well as the jupyter notebooks that were presented, clone:

Thank you!
Python for data science March 2017
By Jeremy Jacobson
Python for data science March 2017
Slides on Conda, Python, modules, packages, Numpy, and SciPy.
- 137