Python for data science

Python for data science

Jeremy Jacobson

jeremyallenjacobson@github.io

Lecturer

Institute for Quantitative Theory and Methods

We are the Knights who say ni!

Windows

Search for cmd

 

 

Mac OS X

The Mac command line is a program called Terminal. It lives in the folder

Linux

You already know the answer

Where is the command line?

/Applications/Utilities/

The letter may be different...so you might see something like:

Microsoft Windows [Version 10.0.10586]
(c) 2015 Microsoft Corporation. All rights reserved.

C:\Users\jajaco3>
D:\YourName\Projects\Python>

From now on we will denote the command line by a dollar sign.

$

Go to the command line

Windows

dir
cd
cd..
cd\
mkdir
rmdir

 

Mac OS X

ls
cd
cd ..
cd\
mkdir
rm

 

Linux

ls
cd
cd .. 
cd\
mkdir
rm

 

Command syntax

usage: conda-script.py [-h] [-V] [--debug] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    info         Display information about current conda install.
$ conda
$ conda info
Current conda install:

             platform : win-64
        conda version : 4.1.6
    conda-env version : 2.5.1
  conda-build version : 1.21.3
       python version : 2.7.12.final.0
     requests version : 2.10.0
     root environment : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2  (writable)
  default environment : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2
     envs directories : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs
        package cache : C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\pkgs
         channel URLs : https://repo.continuum.io/pkgs/free/win-64/
                        https://repo.continuum.io/pkgs/free/noarch/
                        https://repo.continuum.io/pkgs/pro/win-64/
                        https://repo.continuum.io/pkgs/pro/noarch/
          config file : None
         offline mode : False
    is foreign system : False
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs\reproduceEnv:

The following NEW packages will be INSTALLED:

    backports:          1.0-py27_0
    backports_abc:      0.4-py27_0
    bokeh:              0.12.3-py27_1
    futures:            3.0.5-py27_0
    jinja2:             2.8-py27_1
    markupsafe:         0.23-py27_2
    mkl:                11.3.3-1
    numpy:              1.11.2-py27_0
    pip:                8.1.2-py27_0
    python:             2.7.12-0
    python-dateutil:    2.5.3-py27_0
    pyyaml:             3.12-py27_0
    requests:           2.11.1-py27_0
    setuptools:         27.2.0-py27_1
    singledispatch:     3.4.0.3-py27_0
    six:                1.10.0-py27_0
    ssl_match_hostname: 3.4.0.2-py27_1
    tornado:            4.4.2-py27_0
    vs2008_runtime:     9.00.30729.1-2
    wheel:              0.29.0-py27_0

Proceed ([y]/n)?

What if I need to reproduce someone elses calculation using specific versions of various packages?

$ conda create --name reproduceEnv python=2.7 bokeh=0.12.3

Fetching packages ...
numpy-1.11.2-p 100% |###############################| Time: 0:00:00   7.95 MB/s
pyyaml-3.12-py 100% |###############################| Time: 0:00:00   1.10 MB/s
requests-2.11. 100% |###############################| Time: 0:00:00   3.06 MB/s
setuptools-27. 100% |###############################| Time: 0:00:00   3.83 MB/s
tornado-4.4.2- 100% |###############################| Time: 0:00:00   2.69 MB/s
bokeh-0.12.3-p 100% |###############################| Time: 0:00:00   5.53 MB/s
Extracting packages ...
[      COMPLETE      ]|##################################################| 100%
Linking packages ...
[      COMPLETE      ]|##################################################| 100%
#
# To activate this environment, use:
# > activate reproduceEnv
#
# To deactivate this environment, use:
# > deactivate
#

$

If you were to hit 'y' you would see this:

To verify that the 'reproduceEnv' environment has now been added, type the command:

$ conda info --envs
# conda environments:
#
reproduceEnv             C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2\envs\reproduceEnv
root                  *  C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2

Activate this environment with:

C:\>activate reproduceEnv

(reproduceEnv) C:\>

Verify the correct versions with:

(reproduceEnv) C:\>python --version
Python 2.7.12 :: Continuum Analytics, Inc.

(reproduceEnv) C:\>bokeh --version
0.12.3

(reproduceEnv) C:\>
$ source activate reproduceEnv

MAC/LINUX:

WINDOWS:

To return to Anaconda installed defaults, deactivate this environment with the command:

(reproduceEnv) C:\>deactivate reproduceEnv

 C:\>

Verify the Anaconda installed default version with:

C:\>python --version
Python 2.7.12 :: Anaconda 4.1.1 (64-bit)

C:\>bokeh --version
0.12.0

C:\>
$ source deactivate reproduceEnv

MAC/LINUX:

WINDOWS:

To list packages available use:

# packages in environment at C:\Users\jajaco3\AppData\Local\Continuum\Anaconda2:
#
_nb_ext_conf              0.2.0                    py27_0
alabaster                 0.7.8                    py27_0
anaconda                  4.1.1               np111py27_0
anaconda-client           1.4.0                    py27_0
anaconda-navigator        1.2.1                    py27_0
argcomplete               1.0.0                    py27_1
astropy                   1.2.1               np111py27_0
babel                     2.3.3                    py27_0
backports                 1.0                      py27_0
backports_abc             0.4                      py27_0
beautifulsoup4            4.4.1                    py27_0
bitarray                  0.8.1                    py27_1
blaze                     0.10.1                   py27_0
bokeh                     0.12.0                   py27_0
boto                      2.40.0                   py27_0
bottleneck                1.1.0               np111py27_0
bzip2                     1.0.6                     vc9_3  [vc9]
cdecimal                  2.3                      py27_2
cffi                      1.6.0                    py27_0
chest                     0.2.3                    py27_0
click                     6.6                      py27_0
cloudpickle               0.2.1                    py27_0
clyent                    1.2.2                    py27_0
colorama                  0.3.7                    py27_0
comtypes                  1.1.2                    py27_0
conda                     4.1.6                    py27_0
conda-build               1.21.3                   py27_0
$ conda list

What is Python?

  • Type commands individually into interpreter
  • Create a script file (myPythonFile.py).

What is Python?

$ python

Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

>>>

If you want to type commands individually, start the Python interactive shell by typing python.

Python Interactive Shell

$ python

Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

>>>

Interactive Mode:

Python as desktop calculator

 The integer numbers (e.g. 2, 4, 20) have type:

>>> 2+2
4
>>> 53/3
17
>>> 53/4
13
int

 The decimals (e.g. 53.0, 4.0, 13.25) have type:

float

 If both operands are of type int, floor division is performed and an int is returned (e.g. 53/4 returns 13)

 If either operand is a float, classic division is performed and a float is returned (e.g. 53.0/4 or 53/4.0 return 13.25)


>>> 53/4.0
13.25
>>> 53.0/4
13.25

Python as a desktop calculator

>>> type(53/3)
<type 'int'>
>>> type(53)
<type 'int'>
>>> type(53.0)
<type 'float'>
>>> 53 / 4 # int / int  -> int
13
>>> 53 / 4.0 # int / float -> float
13.25
>>> 53 // 4
13
>>> 53 // 4.0
13.0
>>> 53 // 4.0 # explicit floor division discards the remainder
13.0
>>> 53 % 4 # the % operator returns the remainder
1
>>> 13 * 4 + 1 # result * divisor + remainder
53

# comment

Python as a desktop calculator

>>> 9 ** 1
9
>>> 9 ** 2
81
>>> 9 ** 3
729
>>> 9 ** 4
6561
>>> 9 ** 5
59049

With Python, use the ** operator to calculate powers

You can also use pow( , )

>>> pow(2,3)
8
>>> pow(9,1)
9
>>> pow(9,2)
81
>>> pow(9,3)
729
>>>

Python as a desktop calculator

>>> width = 23.56
>>> height = 46.9
>>>




The equal sign (=) is used to assign a value to a variable.
Afterwards, no result is displayed before the next cell.

If a variable is not “defined” (assigned a value), trying to use it will give you an error:

>>> heitgh * width
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'heitgh' is not defined
>>> height * width
1104.964

In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:

>>> width = 23.5482934
>>> height = 46.923
>>> height * width
1104.9565712081999
>>> _
1104.9565712081999
>>> round(_,4)
1104.9566
>>> round(_,2)
1104.96
>>>

Python as a desktop calculator: math module

>>> factorial(10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'factorial' is not defined
>>> math.factorial(10)
3628800
>>> math.factorial(2)*math.factorial(8)
80640
>>> math.factorial(10)/_
45
{n \choose k} =\frac{n!}{(n-k)!k!}
{10 \choose 2} =\frac{10!}{(10-2)!2!}=45

Python as a desktop calculator: math module

>>> cos(pi)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'cos' is not defined
>>> math.cos(math.pi)
-1.0
\cos(\pi)=-1

Python as a desktop calculator: math module

>>> from math import cos, pi, factorial
>>> factorial(10)
3628800
>>> cos(pi)
-1.0
\cos(\pi)=-1
10!=3628800

Python as a desktop calculator: math module

>>> from math import *
>>> factorial(10)
3628800
>>> cos(pi)
-1.0
>>> exp(1)
2.718281828459045
\cos(\pi)=-1
10!=3628800
e^{1}=2.718281828459045090795598298427648842334747314453125
>>> exp(1)
2.718281828459045
e^{1}=2.718281828459045090795598298427648842334747314453125
>>> format(exp(1), '.50g')
'2.7182818284590450907955982984276488423347473144531'
>>> format(exp(1), '.51g')
'2.71828182845904509079559829842764884233474731445312'
>>> format(exp(1), '.52g')
'2.718281828459045090795598298427648842334747314453125'
>>> format(exp(1), '.53g')
'2.718281828459045090795598298427648842334747314453125'

Python as a desktop calculator: fractions module

\frac{a}{b}+\frac{c}{d}=\frac{ad+bc}{bd}
>>> from fractions import Fraction
\frac{a}{b}\times\frac{c}{d}=\frac{ac}{bd}
>>> Fraction(1,6)
Fraction(1, 6)
>>> _
Fraction(1, 6)
>>> float(Fraction(1,6))
0.16666666666666666
>>> Fraction(1,6)+Fraction(1,6)
Fraction(1, 3)
(\frac{1}{6})^2=\frac{1}{36}

>>> Fraction(1,6)*Fraction(1,6)
Fraction(1, 36)
>>> Fraction(1,6)**2
Fraction(1, 36)
>>> pow(Fraction(1,6), 2)
0.027777777777777776
\frac{1}{6}\times\frac{1}{6}=\frac{1}{36}

Python as a desktop calculator: fractions module

(\frac{1}{6})^{15}=\frac{1}{470184984576}
>>> Fraction(35,19238)*Fraction(12,12384)
Fraction(35, 19853616)
>>>>>> float(_)
1.7629030399298546e-06
>>>
\frac{35}{19238}\times \frac{12}{12384}=\frac{35}{19853616}
>>> Fraction(1,6)**15
Fraction(1, 470184984576)
>>> float(_)
2.1268224907304786e-12

Defining functions

Functions Defined

The core of extensible programming is defining functions. 

# Python 3: Fibonacci series up to n
>>> def fib(n):
>>>     a, b = 0, 1
>>>     while a < n:
>>>         print(a, end=' ')
>>>         a, b = b, a+b
>>>     print()
>>> fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

Defining functions

3 types of Functions

  • functions always available for usage
  • functions contained within external modules which must be imported
  • functions defined by a programmer with the def keyword (see example on left).
# Python 3: Fibonacci series up to n
>>> def fib(n):
>>>     a, b = 0, 1
>>>     while a < n:
>>>         print(a, end=' ')
>>>         a, b = b, a+b
>>>     print()
>>> fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

Defining functions:

Le Chevalier de Méré

Defining functions:

Le Chevalier de Méré

P(\text{throw a pair of sixes})=\frac{1}{36}
P(\text{do not throw a pair of sixes})=\frac{35}{36}
P(\text{at least one pair of sixes in 24 throws})=\text{?}
=1-P(\text{no pair of sixes in 24 throws})
=1-(\frac{35}{36})^{24}

Defining functions

 

>>> def chevalier(n):
...     return Fraction(1,1)-Fraction(35,36)**n
...
>>> 
P(\text{at least one pair of sixes in } n \text{ throws})=
=1-(\frac{35}{36})^{n}

Defining functions

 

>>> chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> 
P(\text{at least one pair of sixes in } 24 \text{ throws})=
=\frac{11033126465283976852912127963392284191}{ 22452257707354557240087211123792674816}

Defining functions

 

>>> chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> float(_)
0.49140387613090325
>>>
P(\text{at least one pair of sixes in } 24 \text{ throws})=
=\frac{11033126465283976852912127963392284191}{ 22452257707354557240087211123792674816}
=0.49140387613090325
>>> chevalier(25)
Fraction(408611683992293747092011689842522621501, 808281277464764060643139600456536293376)
>>> float(_)
0.5055315462383781
>>>

Defining functions

 

P(\text{at least one pair of sixes in } 25 \text{ throws})=
=\frac{408611683992293747092011689842522621501}{ 808281277464764060643139600456536293376}
=0.5055315462383781
>>> exit()

C:\Users\jajaco3>python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

Defining functions:

exit()

>>> chevalier(25)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'chevalier' is not defined
>>>

"Non défini!"

Writing Python code

If you want to reuse your function, you must write it down:

  • In a .py file using a text editor, e.g. sublime text
  • In a Ipython/Jupyter notebook

Writing Python code:

Sublime text

Writing Python code:

Jupyter Notebooks

Modules

  • A module is a file containing Python definitions and statements.​

 

# Probability problem of Chevalier module
from fractions import Fraction
def chevalier(x):
	return Fraction(1,1)-Fraction(35,36)**x
def fchevalier(x):
	return float(Fraction(1,1)-Fraction(35,36)**x)
  • The file name is the module name with the suffix .py appended.
  • For instance, use your favorite text editor to create a file called mymodule.py in the current directory with the following contents:

Modules

Now enter the Python interpreter and import this module with the following command:

C:\Users\jajaco3\Desktop\PythonForDataScience>python
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import mymodule

Modules

Using the module name you can access the functions:

 


>>> mymodule.chevalier(24)
Fraction(11033126465283976852912127963392284191, 22452257707354557240087211123792674816)
>>> mymodule.fchevalier(24)
0.49140387613090325
>>>

Modules

If you intend to use a function often you can assign it to a local name:


>>> chev = mymodule.chevalier
>>> chev(25)
Fraction(408611683992293747092011689842522621501, 808281277464764060643139600456536293376)
>>> fchev = mymodule.fchevalier
>>> fchev(25)
0.5055315462383781
>>>

The built-in function dir() is used to find out which names a module defines.

It returns a sorted list of strings:

Modules

 


>>> import mymodule
>>> dir(mymodule)
['Fraction', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'chevalier', 'fchevalier']
>>>
  • The built-in function help() provides helpful information from "docstrings" written into the function.

Modules

 

>>> import math
>>> help(math.log)
Help on built-in function log in module math:

log(...)
    log(x[, base])

    Return the logarithm of x to the given base.
    If the base not specified, returns the natural logarithm (base e) of x.

>>>
  • However, not all functions have docstrings (e.g. we did not write one for our function) but the vast majority are documented this way.

Modules:

packages

 

  • Packages are a way of structuring Python’s module namespace by using “dotted module names”.
  • For example, the module name A.B designates a submodule named B in a package named A.
  • The use of dotted module names saves the authors of multi-module packages like NumPy from having to worry about each other’s module names.

Modules:

Executing modules as script

  • The code that parses the command line only runs if the module is executed as the “main” file

 

 if __name__ == "__main__":
     import sys
     fchevalier(int(sys.argv[1]))


Python for data science:

Day 2

cartoon from xkcd

Python Packages

NumPy

Matplotlib

Pandas

SciPy

SymPy

NumPy:

import numpy

import numpy as np

>>> import numpy
>>> print dir(numpy)
['ALLOW_THREADS', 'BUFSIZE', 'CLIP', 'ComplexWarning', 'DataSource', 'ERR_CALL', 'ERR_DEFAULT', 'ERR_IGNORE', 'ERR_LOG', 'ERR_PRINT', 'ERR_RAISE', 'ERR_WARN', 'FLOATING_POINT_SUPPORT', 'FPE_DIVIDEBYZERO', 'FPE_INVALID', 'FPE_OVERFLOW', 'FPE_UNDERFLOW', 'False_', 'Inf', 'Infinity', 'MAXDIMS', 'MAY_SHARE_BOUNDS', 'MAY_SHARE_EXACT', 'MachAr', 'ModuleDeprecationWarning', 'NAN', 'NINF', 'NZERO', 'NaN', 'PINF', 'PZERO', 'PackageLoader', 'RAISE', 'RankWarning', 'SHIFT_DIVIDEBYZERO', 'SHIFT_INVALID', 'SHIFT_OVERFLOW', 'SHIFT_UNDERFLOW', 'ScalarType', 'Tester', 'TooHardError', 'True_', 'UFUNC_BUFSIZE_DEFAULT', 'UFUNC_PYVALS_NAME', 'VisibleDeprecationWarning', 'WRAP', '_NoValue', '__NUMPY_SETUP__', '__all__', '__builtins__', '__config__', '__doc__', '__file__', '__git_revision__', '__mkl_version__', '__name__', '__package__', '__path__', '__version__', '_import_tools', '_mat', 'abs', 'absolute', 'absolute_import', 'add', 'add_docstring', 'add_newdoc', 'add_newdoc_ufunc', 'add_newdocs', 'alen', 'all', 'allclose', 'alltrue', 'alterdot', 'amax', 'amin', 'angle', 'any', 'append', 'apply_along_axis', 'apply_over_axes', 'arange', 'arccos', 'arccosh', 'arcsin', 'arcsinh', 'arctan', 'arctan2', 'arctanh', 'argmax', 'argmin', 'argpartition', 'argsort', 'argwhere', 'around', 'array', 'array2string', 'array_equal', 'array_equiv', 'array_repr', 'array_split', 'array_str', 'asanyarray', 'asarray', 'asarray_chkfinite', 'ascontiguousarray', 'asfarray', 'asfortranarray', 'asmatrix', 'asscalar', 'atleast_1d', 'atleast_2d', 'atleast_3d', 'average', 'bartlett', 'base_repr', 'bench', 'binary_repr', 'bincount', 'bitwise_and', 'bitwise_not', 'bitwise_or', 'bitwise_xor', 'blackman', 'bmat', 'bool', 'bool8', 'bool_', 'broadcast', 'broadcast_arrays', 'broadcast_to', 'busday_count', 'busday_offset', 'busdaycalendar', 'byte', 'byte_bounds', 'bytes_', 'c_', 'can_cast', 'cast', 'cbrt', 'cdouble', 'ceil', 'cfloat', 'char', 'character', 'chararray', 'choose', 'clip', 'clongdouble', 'clongfloat', 'column_stack', 'common_type', 'compare_chararrays', 'compat', 'complex', 'complex128', 'complex64', 'complex_', 'complexfloating', 'compress', 'concatenate', 'conj', 'conjugate', 'convolve', 'copy', 'copysign', 'copyto', 'core', 'corrcoef', 'correlate', 'cos', 'cosh', 'count_nonzero', 'cov', 'cross', 'csingle', 'ctypeslib', 'cumprod', 'cumproduct', 'cumsum', 'datetime64', 'datetime_as_string', 'datetime_data', 'deg2rad', 'degrees', 'delete', 'deprecate', 'deprecate_with_doc', 'diag', 'diag_indices', 'diag_indices_from', 'diagflat', 'diagonal', 'diff', 'digitize', 'disp', 'divide', 'division', 'dot', 'double', 'dsplit', 'dstack', 'dtype', 'e', 'ediff1d', 'einsum', 'emath', 'empty', 'empty_like', 'equal', 'errstate', 'euler_gamma', 'exp', 'exp2', 'expand_dims', 'expm1', 'extract', 'eye', 'fabs', 'fastCopyAndTranspose', 'fft', 'fill_diagonal', 'find_common_type', 'finfo', 'fix', 'flatiter', 'flatnonzero', 'flexible', 'fliplr', 'flipud', 'float', 'float16', 'float32', 'float64', 'float_', 'floating', 'floor', 'floor_divide', 'fmax', 'fmin', 'fmod', 'format_parser', 'frexp', 'frombuffer', 'fromfile', 'fromfunction', 'fromiter', 'frompyfunc', 'fromregex', 'fromstring', 'full', 'full_like', 'fv', 'generic', 'genfromtxt', 'get_array_wrap', 'get_include', 'get_printoptions', 'getbuffer', 'getbufsize', 'geterr', 'geterrcall', 'geterrobj', 'gradient', 'greater', 'greater_equal', 'half', 'hamming', 'hanning', 'histogram', 'histogram2d', 'histogramdd', 'hsplit', 'hstack', 'hypot', 'i0', 'identity', 'iinfo', 'imag', 'in1d', 'index_exp', 'indices', 'inexact', 'inf', 'info', 'infty', 'inner', 'insert', 'int', 'int0', 'int16', 'int32', 'int64', 'int8', 'int_', 'int_asbuffer', 'intc', 'integer', 'interp', 'intersect1d', 'intp', 'invert', 'ipmt', 'irr', 'is_busday', 'isclose', 'iscomplex', 'iscomplexobj', 'isfinite', 'isfortran', 'isinf', 'isnan', 'isneginf', 'isposinf', 'isreal', 'isrealobj', 'isscalar', 'issctype', 'issubclass_', 'issubdtype', 'issubsctype', 'iterable', 'ix_', 'kaiser', 'kron', 'ldexp', 'left_shift', 'less', 'less_equal', 'lexsort', 'lib', 'linalg', 'linspace', 'little_endian', 'load', 'loads', 'loadtxt', 'log', 'log10', 'log1p', 'log2', 'logaddexp', 'logaddexp2', 'logical_and', 'logical_not', 'logical_or', 'logical_xor', 'logspace', 'long', 'longcomplex', 'longdouble', 'longfloat', 'longlong', 'lookfor', 'ma', 'mafromtxt', 'mask_indices', 'mat', 'math', 'matmul', 'matrix', 'matrixlib', 'max', 'maximum', 'maximum_sctype', 'may_share_memory', 'mean', 'median', 'memmap', 'meshgrid', 'mgrid', 'min', 'min_scalar_type', 'minimum', 'mintypecode', 'mirr', 'mod', 'modf', 'moveaxis', 'msort', 'multiply', 'nan', 'nan_to_num', 'nanargmax', 'nanargmin', 'nanmax', 'nanmean', 'nanmedian', 'nanmin', 'nanpercentile', 'nanprod', 'nanstd', 'nansum', 'nanvar', 'nbytes', 'ndarray', 'ndenumerate', 'ndfromtxt', 'ndim', 'ndindex', 'nditer', 'negative', 'nested_iters', 'newaxis', 'newbuffer', 'nextafter', 'nonzero', 'not_equal', 'nper', 'npv', 'numarray', 'number', 'obj2sctype', 'object', 'object0', 'object_', 'ogrid', 'oldnumeric', 'ones', 'ones_like', 'outer', 'packbits', 'pad', 'partition', 'percentile', 'pi', 'piecewise', 'pkgload', 'place', 'pmt', 'poly', 'poly1d', 'polyadd', 'polyder', 'polydiv', 'polyfit', 'polyint', 'polymul', 'polynomial', 'polysub', 'polyval', 'power', 'ppmt', 'print_function', 'prod', 'product', 'promote_types', 'ptp', 'put', 'putmask', 'pv', 'r_', 'rad2deg', 'radians', 'random', 'rank', 'rate', 'ravel', 'ravel_multi_index', 'real', 'real_if_close', 'rec', 'recarray', 'recfromcsv', 'recfromtxt', 'reciprocal', 'record', 'remainder', 'repeat', 'require', 'reshape', 'resize', 'restoredot', 'result_type', 'right_shift', 'rint', 'roll', 'rollaxis', 'roots', 'rot90', 'round', 'round_', 'row_stack', 's_', 'safe_eval', 'save', 'savetxt', 'savez', 'savez_compressed', 'sctype2char', 'sctypeDict', 'sctypeNA', 'sctypes', 'searchsorted', 'select', 'set_numeric_ops', 'set_printoptions', 'set_string_function', 'setbufsize', 'setdiff1d', 'seterr', 'seterrcall', 'seterrobj', 'setxor1d', 'shape', 'shares_memory', 'short', 'show_config', 'sign', 'signbit', 'signedinteger', 'sin', 'sinc', 'single', 'singlecomplex', 'sinh', 'size', 'sometrue', 'sort', 'sort_complex', 'source', 'spacing', 'split', 'sqrt', 'square', 'squeeze', 'stack', 'std', 'str', 'str_', 'string0', 'string_', 'subtract', 'sum', 'swapaxes', 'sys', 'take', 'tan', 'tanh', 'tensordot', 'test', 'testing', 'tile', 'timedelta64', 'trace', 'transpose', 'trapz', 'tri', 'tril', 'tril_indices', 'tril_indices_from', 'trim_zeros', 'triu', 'triu_indices', 'triu_indices_from', 'true_divide', 'trunc', 'typeDict', 'typeNA', 'typecodes', 'typename', 'ubyte', 'ufunc', 'uint', 'uint0', 'uint16', 'uint32', 'uint64', 'uint8', 'uintc', 'uintp', 'ulonglong', 'unicode', 'unicode0', 'unicode_', 'union1d', 'unique', 'unpackbits', 'unravel_index', 'unsignedinteger', 'unwrap', 'ushort', 'vander', 'var', 'vdot', 'vectorize', 'version', 'void', 'void0', 'vsplit', 'vstack', 'warnings', 'where', 'who', 'zeros', 'zeros_like']
>>> import numpy as np

NumPy

numpy.linalg.norm:

Length of a vector



>>> np.linalg.norm([2,4])
4.4721359549995796

numpy.linalg.inv:

Inverse of the Matrix

>>> np.linalg.inv([[2,0],[0,2]])
array([[ 0.5,  0. ],
       [ 0. ,  0.5]])

NumPy

>>> M = np.matrix([[2,0],[0,2]])
>>> M * M
matrix([[4, 0],
        [0, 4]])
>>> M.I
matrix([[ 0.5,  0. ],
        [ 0. ,  0.5]])
>>> _*M
matrix([[ 1.,  0.],
        [ 0.,  1.]])

NumPy

numpy.matrix:

alternative to numpy.array

NumPy:

numpy.random:

Random sampling

 

rand(d0, d1, ..., dn) Random values in a given shape.
randn(d0, d1, ..., dn) Return a sample (or samples) from the “standard normal” distribution.
randint(low[, high, size, dtype]) Return random integers from low (inclusive) to high (exclusive).
random_integers(low[, high, size]) Random integers of type np.int between low and high, inclusive.
random_sample([size]) Return random floats in the half-open interval [0.0, 1.0).
random([size]) Return random floats in the half-open interval [0.0, 1.0).
ranf([size]) Return random floats in the half-open interval [0.0, 1.0).
sample([size]) Return random floats in the half-open interval [0.0, 1.0).
choice(a[, size, replace, p]) Generates a random sample from a given 1-D array
bytes(length) Return random bytes.

Permutations

shuffle(x) Modify a sequence in-place by shuffling its contents.
permutation(x) Randomly permute a sequence, or return a permuted range.
>>> np.random.rand(1)
array([ 0.88288734])
>>> np.random.rand(2)
array([ 0.49363292,  0.16510102])
>>> np.random.rand(1,2)
array([[ 0.91578292,  0.60660245]])
>>> np.random.rand(2,1)
array([[ 0.23357355],
       [ 0.25666865]])
>>> np.random.rand(2,2, 2)
array([[[ 0.00763289,  0.09970221],
        [ 0.68506231,  0.31527997]],

       [[ 0.94116485,  0.43140191],
        [ 0.78844403,  0.2575242 ]]])
>>> np.random.rand(2,2, 2, 2)
array([[[[ 0.57742619,  0.99551532],
         [ 0.99774305,  0.45852133]],

        [[ 0.06396083,  0.68351763],
         [ 0.96607077,  0.27239997]]],


       [[[ 0.91702878,  0.33364644],
         [ 0.34576949,  0.06507407]],

        [[ 0.20052982,  0.24095151],
         [ 0.1145685 ,  0.28035093]]]])
>>>
>>> np.random.permutation(6)
array([5, 1, 4, 0, 3, 2])
>>> sequence = np.random.permutation(10)
>>> print(sequence)
[1 2 5 9 8 3 4 6 0 7]

NumPy:

numpy.random.permutation()

 

>>> np.random.uniform(-1,1, size=10)
array([-0.08,  0.74, -0.64, -0.98,  0.98, -0.21, -0.36, -0.34, -0.16, -0.86])

NumPy:

numpy.random.uniform()

>>> np.random.uniform(-1,1, size=10000)
array([ 0.34, -0.88, -0.64, ..., -0.87,  0.09,  0.42])
>>> stats.describe(_)
DescribeResult(nobs=10000L, minmax=(-0.99984799016922654, 0.99997505667204889),
mean=-0.0017052074604612564, variance=0.33296037891169855, skewness=-0.004341077496136224,
 kurtosis=-1.2054297738246007)
>>>

NumPy:

numpy.random.uniform()

scipy.stats

Statistical functions

This module contains a large number of probability distributions as well as a growing library of statistical functions.

 

scipy.stats.describe

>>> np.random.permutation(6)
array([5, 1, 4, 0, 3, 2])
>>> sequence = np.random.permutation(10)
>>> print(sequence)
[1 2 5 9 8 3 4 6 0 7]
>>> stats.describe(sequence)
DescribeResult(nobs=10, minmax=(0, 9), mean=4.5, variance=9.1666666666666661,
 skewness=0.0, kurtosis=-1.2242424242424244)

scipy.misc.comb

>>> from scipy.special import comb
>>> comb(10,2)
45.0
>>> comb(3,2)
3.0
>>>

For many more stat related functions install the software R and the interface package rpy.

Goal: a billion points, meaningfully, interactively, in the browser

Bokeh

MatplotLib

MatplotLib

matplotlib.style.use('fivethirtyeight')

Without this change, most styles will default to the "jet" colormap.

MatplotLib

$ git clone https://github.com/tonysyu/matplotlib-style-gallery.git
$ cd matplotlib-style-gallery
$ python -m mpl_style_gallery

To run the gallery showing styles available, simply grab the source and run the package as a script:

 "a 2 billion dollar

facebook for programmers" 

"15 billion active users who learn, share, and work together to build software"

  • Github: file repository like dropbox

  • Git: version control system for software dev.

"free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency."

$ git clone
$ git push

Clone a repository

Update server with your commits across all branches that are common between your local copy and server

$ git fork

Points your repo to original via an alternates file

$ git pull

Fetch changes from server and merge into current branch

Jupyter Notebooks

"Open source, interactive data science and scientific computing across over 40 programming languages."

Notebooks may be exported to a range of static formats:

  • HTML (for example, for blog posts)
  • LaTeX
  • PDF
  • Slide shows (via reveal.js)

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

 Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Any .ipynb notebook document available from a public URL can be shared via the Jupyter Notebook Viewer (nbviewer).

Jupyter Notebooks

The landing page of the Jupyter notebook web application, the dashboard, shows the notebooks currently available in the notebook directory

Jupyter Notebooks

Jupyter Notebooks

Ok, I want to download a notebook and start working with it. What next?

Cloning notebooks from Github

  • Use browser (Github.com)

  • Use Git 

https://github.com/neuroneuro15
https://github.com/neuroneuro15/SciPyCourse2016
C:\Users\jajaco3\Documents>git clone https://github.com/neuroneuro15/SciPyCourse2016
Cloning into 'SciPyCourse2016'...
remote: Counting objects: 248, done.
Receiving objects: 100% (248/248), 7.16 MiB | 0 bytes/s, done.48Receiving objects:  98% (244/248)

Resolving deltas: 100% (126/126), done.
Checking connectivity... done.
git-lfs smudge -- 'Homework Sample Data/Homework 4/eeg_data.mat': git-lfs: command not found
error: external filter git-lfs smudge -- %f failed -1
error: external filter git-lfs smudge -- %f failed
fatal: Homework Sample Data/Homework 4/eeg_data.mat: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'


C:\Users\jajaco3\Documents>
C:\Users\jajaco3\Documents>dir
 Volume in drive C is Windows
 Volume Serial Number is E88B-1894

 Directory of C:\Users\jajaco3\Documents

11/03/2016  04:52 PM    <DIR>          .
11/03/2016  04:52 PM    <DIR>          ..
11/01/2016  05:10 PM    <DIR>          .ipynb_checkpoints
10/05/2016  03:16 PM             2,873 .Rhistory
09/11/2016  07:13 PM    <DIR>          btabibian.github.io
09/11/2016  07:16 PM    <DIR>          btabibian.github.io-master
08/17/2016  03:46 PM    <DIR>          Custom Office Templates
11/01/2016  09:25 PM    <DIR>          GitHub
09/27/2016  11:17 AM    <DIR>          LabData
09/11/2016  08:31 PM            33,070 Lecture1-Copy1.ipynb
11/01/2016  11:29 AM            10,731 Lecture1.ipynb
08/18/2016  03:03 PM    <DIR>          Python Scripts
11/01/2016  05:41 PM    <DIR>          pythonexamplefolder
08/22/2016  09:21 AM    <DIR>          R
11/03/2016  04:37 PM    <DIR>          SciPyCourse2016-master
09/02/2016  09:02 PM    <DIR>          Turning
11/03/2016  10:03 AM    <DIR>          TurningPoint 5
11/01/2016  05:10 PM             2,699 Untitled1.ipynb
               4 File(s)         49,373 bytes
              14 Dir(s)  196,888,305,664 bytes free

Find the directory containing SciPyCourse2016-master

C:\Users\jajaco3\Documents>jupyter notebook
[W 16:55:15.381 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 16:55:17.400 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 16:55:18.250 NotebookApp] Γ£ô nbpresent HTML export ENABLED
[W 16:55:18.253 NotebookApp] Γ£ù nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 16:55:18.263 NotebookApp] [nb_conda] enabled
[I 16:55:18.414 NotebookApp] [nb_anacondacloud] enabled
[I 16:55:18.767 NotebookApp] Serving notebooks from local directory: C:\Users\jajaco3\Documents
[I 16:55:18.770 NotebookApp] 0 active kernels
[I 16:55:18.772 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 16:55:18.776 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Open Jupyter notebook

C:\Users\jajaco3\Documents>jupyter notebook
[W 16:55:15.381 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 16:55:17.400 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 16:55:18.250 NotebookApp] Γ£ô nbpresent HTML export ENABLED
[W 16:55:18.253 NotebookApp] Γ£ù nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 16:55:18.263 NotebookApp] [nb_conda] enabled
[I 16:55:18.414 NotebookApp] [nb_anacondacloud] enabled
[I 16:55:18.767 NotebookApp] Serving notebooks from local directory: C:\Users\jajaco3\Documents
[I 16:55:18.770 NotebookApp] 0 active kernels
[I 16:55:18.772 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 16:55:18.776 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Open Jupyter notebook

Open any .ipynb file and you are on your way to learning python using Jupyter notebooks and Github!

git clone https://github.com/jeremyallenjacobson/PDS-october2017.git

To obtain these slides as well as the jupyter notebooks that were presented, type at the command line:

Thank you!

Python for data science October 2017

By Jeremy Jacobson

Python for data science October 2017

Slides on Conda, Python, modules, packages, Numpy, and SciPy.

  • 159