Learning Data Science

Lecture 7
Scientific Python

pandas

  • fast
  • powerful
  • flexible
  • easy to use
  • open source
  • data analysis tool
  • data manipulation tool

Definition from the website

Everything we learned using pandas

  • Series
  • dtypes
  • indexing
  • Series operations
  • Boolean Indexing
  • DataFrames
  • Inspecting DataFrames
  • Reading/writing tabular data
  • Indexing rows
  • DataFrame boolean indexing
  • Sorting DataFrames
  • Stats of columns
  • DataFrame Operations
  • The magic apply() method
  • Axes in DataFrames
  • Aggregation
  • Plotting from pandas
  • Pandas and Seaborn

Anatomy of a Pandas DataFrame

CSV file

pd.Series

Geom

Alg

Trig

Calc

4.0
4.3
2.0
2.3

Take one column

Take one row

1.3
4.0
1.3

Susie

Jay

Lara

pd.Series

A fancy numpy array, but with an index column

Take one column

Susie

Jay

Lara

pd.DataFrame

✨ Fancy Google Sheets or Excel but in Python

Trig Alg Geom Calc
1.3 1.3 3.7 2.3
4.0 4.3 2.0 2.3
1.3 1.0 2.0 3.0

Open with pandas

Index

Name Trig Alg Geom Calc
Lara 1.3 1.3 3.7 2.3
Jay 4.0 4.3 2.0 2.3
Susie 1.3 1.0 2.0 3.0

Lecture 7

  1. Recap
  2. SQL Primer
  3. Using APIs
  4. Monte Carlo simulations
  5. Scientific Python

SQL and SQL Databases

SQL = Structured Query Language

What is it?

 

A programming language for managing data in a relational database.

Relational Databases

Tabular data stored in rows and columns

with

multiple interconnected tables

Susie

Jay

Lara

Trig Alg Geom Calc
1.3 1.3 3.7 2.3
4.0 2.0 2.3
1.3 1.0 3.0

Susie

Jay

Lara

Last Age Uni Cats
Jones 22 TUM 0
Sun 23 LMU 6
Blue 25 LMU 1

Here they have the same Index

so we can take data from multiple tables

grades
students

Relational Databases

Susie

Jay

Lara

Trig Alg Geom Calc
1.3 1.3 3.7 2.3
4.0 2.0 2.3
1.3 1.0 3.0

Susie

Jay

Lara

Last ID Uni Cats
Jones 45 TUM 0
Sun 48 LMU 6
Blue 66 LMU 1
grades
students

Susie

Jay

Lara

unis

LMU

Ulm

TUM

City Students Courses
Munich [48, 66] [Trig, Alg]
Munich [45] [Geom, Calc]
Ulm [] [Trig, Alg, Calc]
courses

Geo

Alg

Trig

ID Prof ID Students
1 44 [45, 48, 66]
2 154 [45, 66]
3 22 [45, 48]

Relational Databases

SQL: For when you have lots of data which interconnects in complex ways!

SQL Databases run the world!

Many database formats use SQL

The main trio:

  • Web-dominant
  • Huge ecosystem
  • Great for local use
  • Built into python
  • No server
  • Most large-scale applications
  • Academics fave 💕

Let's look at a real-life example

Movie rental store 🍿

Let's look at a real-life example

Let's move into the notebook now:

[URL SOON]

Let's look at a real-life example

Download the dataset:

https://www.kaggle.com/api/v1/datasets/download/atanaskanev/sqlite-sakila-sample-database

Lecture 7

  1. Recap
  2. SQL Primer
  3. Using APIs
  4. Monte Carlo simulations
  5. Scientific Python

APIs

Application Programming Interface

Think of APIs like a waiter at a restaurant

“I can’t honestly recommend anything – I’ve watched them make the stuff.”
Roy Fox — May 1, 1954

APIs

Application Programming Interface

Think of APIs like a waiter at a restaurant:

  • You get a list of things you can do
  • You request them from an API
  • Then the information you want gets delivered back

APIs

Application Programming Interface

APIs

Application Programming Interface

Most APIs are accessed with URLs:

 

http://reddit.com/api/r/python/new

Let's go over to the notebook and try using some APIs ourselves!

Challenge #2

locations = {
    "Berlin": {
        "latitude": 52.52,
        "longitude": 13.41
    },
    "Paris": {
        "latitude": 48.85,
        "longitude": 2.35
    },
    "Rome": {
        "latitude": 41.90,
        "longitude": 12.48
    }
}

Lecture 7

  1. Recap
  2. SQL Primer
  3. Using APIs
  4. Monte Carlo methods
  5. Scientific Python

Monte Carlo (MC) Methods

Started at Los Alamos as a way to model neutron diffusion

Needed a code name for this method 🕵️‍♂️

Monte Carlo (MC) Methods

Use random sampling to estimate a very complicated probability distribution

Simple example:

  • We are playing a game with two dice
  • To win, you need to roll two 6's on the first try
  • What are the chances of winning?

Let's code it up!

Monte Carlo (MC) Methods

That was easy to calculate by hand... but what about this?

Let's code it up!

  • You pick three random dice with a number of sides between 2 and 20
  • To win you need to roll at least 30
  • What are you chances of winning?

Challenge #2

That was easy to calculate by hand... but what about this?

You can use a for loop if it helps conceptually.
For an extra challenge, try and do it only with vectors 💪

  • You pick three random dice with a number of sides between 2 and 20
  • To win you need to roll at least 30
  • What are you chances of winning?

MC in the real world

Imagine you want to determine the habitability or other properties of star systems. This depends on many things:

  • the type of star
  • the age of the star
  • the number of planets in the system
  • the mass of the planets
  • the orbits of each planet
  • etc

Given the observations we have of stars and exoplanets, we could reasonably estimate the distributions of each of these parameters

MC in the real world

Imagine you want to determine the habitability or other properties of star systems. This depends on many things:

  • Many different attributes

Distributions of attribute from observations

Draw initial parameters and simulate many examples with MC 

MC in the real world

Imagine you want to determine the habitability of star systems. This depends many things including:

  • Many different attributes

Distributions of attribute from observations

Draw initial parameters and simulate many examples with MC 

Lecture 7

  1. Recap
  2. SQL Primer
  3. Using APIs
  4. Monte Carlo simulations
  5. Scientific Python

scipy

Algorithms for:

  • Optimization
  • Integration
  • Interpolation
  • Linear Algebra
  • Differential Equations
  • and more

scipy

Most algorithms are written in:

  • Fortran
  • C
  • C++

 

and then "wrapped" in Python.

install scipy

uv add scipy

Things we will try today:

  • scipy.constants
  • scipy.stats
  • scipy.integrate
  • scipy.interpolate
  • scipy.optimize
  • scipy.fft

Back to the notebook we go!

Lecture 7

  1. Recap
  2. SQL Primer
  3. Using APIs
  4. Monte Carlo simulations
  5. Scientific Python

The End

Learning Data Science Lecture 7

By astrojarred

Private

Learning Data Science Lecture 7