Data wrangling

with Python / Numpy

 

http://slides.com/dansandiford/deck-4

What is Python?

Python is a simple general-purpose programming language.  Python is not tied to a specific task or discipline, but is widely used for processing and analysing text, numbers, images, building websites and applications, and increasingly in machine learning and data science.

 

http://slides.com/dansandiford/deck-4

What i Do with Python...

http://slides.com/dansandiford/deck-4

http://www.underworldcode.org/pages/Blog/

Learning objectives.

In this chapter we're going to look a Python library called Numpy (numeric Python).

 

A library is group of tools (code) designed to assist a common purpose, yet  outside the core capabilities of the Langauge.  

 

Numpy provides the ability to work with structured groups of numbers (arrays). 

 

http://slides.com/dansandiford/deck-4

Learning objectives.

By the end of this chapter you will learn:

  1. How to import, wrangle, and query structured, numeric data, using Python/Numpy.
  2. The flavour of using Python/Numpy for a data science workflow.

 

This tutorial should be challenging, particularly if you haven't seen a programming language before.

 

http://slides.com/dansandiford/deck-4

Learning (non ) objectives.

 

 

This tutorial should be challenging, particularly if you haven't seen a programming language before.

There are a number of core concepts that I will rush over including:

  • types
  • variables
  • functions

Come back at 4:00 pm Tuesday or 3:30 Wednesday for ToolHack, where we'll discuss these concepts in more detail

 

http://slides.com/dansandiford/deck-4

A problem

Our example looks at a real (data) science problem. In fact, we are going to try to falsify a hypothesis. We'll try to show that there are predictable patterns in day-to-day stockmarket data, a discovery which would contradict the efficient market hypothesis

 

...given a time series, we want to find out how often the change between any two data points  is the same as the change between the previous two data points  

 

http://slides.com/dansandiford/deck-4

Your Turn!

Now we're going to try this together. Start a Jupyter session on the Cloud by going to the website:

https://tinyurl.com/h3td8qn

Once a notebook session has started, go to the 'new' dropdown and start a new terminal. Type in or paste:

git clone https://github.com/dansand/ResChapterBinder.git


option b

Start a Jupyter session by going to the website:

https://tmp60.tmpnb.org

Download and unzip the following repository (click clone or download):

https://github.com/dansand/ResChapterBinder.git

In your tmpnb notebook, click upload and upload the following files from the repo. you just downloaded:

index.ipynb

nasdaq.txt

Come and Talk Python / Jupyter

Python tutorials:

 https://dansand.gitbooks.io/resguides-research-with-jupyter/content/

https://software-carpentry.org/lessons/

a byte of Python: https://python.swaroopch.com/

https://github.com/dansand/ResChapter/blob/master/content/numpy.md

 

 

 

 

 

 

Python communities:

Unimelb facebook group:  research with jupyter

www.pyladies.com/                            http://jupyter.org/

https://www.scipy.org/          pydata.org/

https://djangogirls.org/

 

Copy of deck

By Dan Sandiford

Copy of deck

  • 484