Python For Data Analysis

By: James Shaddix

Versioning

Major

Examples:

1. Current Python Version:    3.7.4

2. Numpy:                                1.17.0

Minor

Patch

.

.

incompatible API changes

add functionality in a backwards-compatible manner

backwards-compatible bug fixes

  • Major:
  • Minor:
  • Patch:

What About a:

0

.

X

Y

.

Release #:

  • Generally, this is a sign that the API is not complete.

Release?

  • The author hasn't stamped the sign of approval.

Examples:

1. Pandas:                                 0.25

What's Python All About?

Python 1 1991
Python 2 2000
Python 3 2008
Core Philosophy
1. Beautiful is better than ugly
2. Explicit is better than implicit
3. Simple is better than complex
4. Complex is better than complicated
5. Readability counts

Python Is A: general purpose, extensible programming language  

Creation

Lets Talk About The Rumor

Is Python Slow?

Sort of ...

The core Functionality of Python is comparatively slow to many languages

We Can Fix This!

Python Is Extensible!

  • Python integrates with C and C++ Very Easily!
  • This has lead to the development of

FAST

scientific libraries.

3rd Party

Python Development Model:

  1. Solve your problem with simple/beautiful code.

2. Figure out where the code is slow, and rewrite in C.

Open Source

My developers:

Develop highly efficient code and

aren't afraid of long nights at the office.

My developers:

Write beautiful code, 

have cool hobbies, and

families to go home to ...

Development Story

Why Python?

Lets Take a Look at The  Landscape ...

Scientific Programming

Python Vs. Matlab

Broad Strokes

Python Benefits

1. Free

2. Language is simple

3. Scales well for large projects

4. Open Source

5. General Purpose

Matlab Benefits

1. Easier for Beginners

2. Interactive Tools

3. Simulink

So which one is better???

What if ...

  • We have access to both languages
  • And are just doing data analysis

Language Model

L =  Third Party Library

P = Matlab Package

There's A Problem ...

Using Python For Data Analysis means you have to manage a lot of third party libraries.

1. You will quickly start to accrue hundreds of packages.

2. It's going to be your job to install/update all of these packages

3. It's your job to have to deal with conflicting packages

This is NOT Trivial !!!

I Know What Your Thinking ...

How Are We Going To Solve This Issue?

Introducing:

1. Comes with the package manager "conda"

2. Allows us to make use of virtual enviroments

3. Comes with a bunch of useful software

4. Comes with pre-installed python packages for data analysis

Notebooks

Ipython Notebook

Jupyter

Notebook

Jupyter

Lab

For Interactive Development

  1. Data Analysis
  2. prototyping algorithms

 for rigorous software development

  • Great
  • Not So Great

Python For Data Analysis

By Jim Shaddix

Python For Data Analysis

Presentation for graduate physics community at Colorado State University.

  • 621