Course outline

14/10 : Introduction ML; Challenge 1 introduction
12/11 : Practical ML; Challenge 1
13/12 : Challenge 1 Ending
22/01 : Results Challenge 1; Challenge 2 introduction; NLP
19/02 : NLP; Challenge 2
18/03 : Challenge 2 Ending

Deadlines

20/12 : 2-3 pages on your work for challenge 1 + code for best solution (github repo is better)
27/03 : Same for challenge 2

=> 2-3 pages on your approach, key findings of your data analysis, what you tried.

I don't read after 3 pages.

Kaggle / Anaconda

Challenge 1 : https://www.kaggle.com/t/c99bd3bbc9df46f48dcb1af31f42bc2f

Register with your complete name.

Anaconda : https://www.anaconda.com/distribution/#download-section

To install on your computer

Data Science

First approach

Data Science = Machine Learning in practice

Data Science = Data cleaning + data transformation + data processing + data engineering + machine learning + data visualisation

[...]

Machine Learning definition

Arthur Samuel (1959).Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

Machine Learning definition

Arthur Samuel (1959).Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?

Classifying emails as spam or not spam.
Watching you label emails as spam or not spam.
The number (or fraction) of emails correctly classified as spam/not spam.

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?

Classifying emails as spam or not spam. => T
Watching you label emails as spam or not spam. => E
The number (or fraction) of emails correctly classified as spam/not spam. => P

Machine learning algorithms:

Supervised learning
Unsupervised learning

Others: Reinforcement learning, recommender systems.

Also talk about: Practical advice for applying learning algorithms

Supervised Learning

House price prediction based on size

Supervised learning: Right answers given

Regression: Predict continuous value output (price)

Example:

A house of 65 m2 sold for 440k

Supervised Learning

House price prediction

With a new house with 30 m2, we would predict a price of 270k

Supervised Learning

House price prediction

With a new house with 30 m2, we would predict a price of 350k

Supervised Learning

Training set

Learning algorithm

h

Size of the house

Estimated price

Hypothesis

maps size of house to price

Supervised Learning

Breast cancer: Is a tumor malignant (1) or not (0)?

Tumor size

1

0

Classification: Discrete valued output

Unsupervised Learning

Tumor size

1

0

Supervised:

Labelled history to learn from

Unsupervised:

Unlabelled data

Unsupervised Learning

Learning from the data. Here, clustering data together.

Unsupervised Learning

In real life

Statistics

Try to squeeze your data into a box and I will perfectly resolve the problem.

Machine Learning

Give me your data and I will do my best

Linear Algebra

Basics

Linear Algebra

\begin{bmatrix} 1402 & 901 \\ 1379 & 843 \\ 1639 & 973 \\ 1103 & 789 \end{bmatrix}

4 x 2 matrix

Dimension of matrix: number of rows x number of columns

\mathbb{R^{4\times2}}

Linear Algebra

A = \begin{bmatrix} 1402 & 901 \\ 1379 & 843 \\ 1639 & 973 \\ 1103 & 789 \end{bmatrix}

\( A_{i,j} \) = "\( i \),\( j \) entry" in the \( i^{th} \) row, \( j^{th} \) column

\( A_{1,1} \) = 1402

\( A_{3,1} \) = 1639

Linear Algebra

Vector = \( n \times 1 \) matrix (= list)

y = \begin{bmatrix} 1402 \\ 1379 \\ 1639 \\ 1103 \end{bmatrix}

\( y \in \mathbb{R^{4}} \)

\( y_{i} \) = \( i^{th} \) element

Linear Algebra

Addition and scalar multiplication

\begin{bmatrix} 1 & 0 \\ 2 & 5 \\ 3 & 1 \end{bmatrix} + \begin{bmatrix} 4 & 0.5 \\ 2 & 5 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 5 & 0.5 \\ 4 & 10 \\ 3 & 2 \end{bmatrix}

3 \times \begin{bmatrix} 1 & 0 \\ 2 & 5 \\ 3 & 1 \end{bmatrix} = \begin{bmatrix} 3 & 0 \\ 6 & 15 \\ 9 & 3 \end{bmatrix}

Be careful to have the same dimension

Linear Algebra

Matrix - vector multiplication

\begin{bmatrix} 1 & 3 \\ 4 & 0 \\ 2 & 1 \end{bmatrix} \times \begin{bmatrix} 1 \\ 5 \end{bmatrix} = \begin{bmatrix} 16 \\ 4 \\ 7 \end{bmatrix}

\( \mathbb{R^{3\times2}} \) \( \mathbb{R^{2\times1}} \) \( \mathbb{R^{3\times1}} \)

Linear Algebra

Matrix - vector multiplication

\begin{bmatrix} 1 & 3 \\ 4 & 0 \\ 2 & 1 \end{bmatrix} \times \begin{bmatrix} 1 \\ 5 \end{bmatrix} = \begin{bmatrix} 16 \\ 4 \\ 7 \end{bmatrix}

\( \mathbb{R^{3\times2}} \) \( \mathbb{R^{2\times1}} \) \( \mathbb{R^{3\times1}} \)

1 \times 1 + 3 \times 5 = 16

Linear Algebra

Matrix - vector multiplication

\begin{bmatrix} 1 & 3 \\ 4 & 0 \\ 2 & 1 \end{bmatrix} \times \begin{bmatrix} 1 \\ 5 \end{bmatrix} = \begin{bmatrix} 16 \\ 4 \\ 7 \end{bmatrix}

\( \mathbb{R^{3\times2}} \) \( \mathbb{R^{2\times1}} \) \( \mathbb{R^{3\times1}} \)

4 \times 1 + 0 \times 5 = 4

Linear Algebra

Matrix - vector multiplication

\begin{bmatrix} . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \end{bmatrix} \times \begin{bmatrix} . \\ . \\ . \end{bmatrix} = \begin{bmatrix} . \\ . \\ . \\ . \end{bmatrix}

\( A \in \mathbb{R^{m \times n}} \)

\( x \in \mathbb{R^{n \times 1}} \)

\( y \in \mathbb{R^{m \times 1}} \)

To get \( y_i \), multiply \( A \)'s \( i^{th} \) row with elements of vector \( x \), and add them up.

Linear Algebra

Matrix - Matrix multiplication

\begin{bmatrix} 1 & 3 & 2 \\ 4 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} 1 & 3 \\ 0 & 1 \\ 5 & 2 \end{bmatrix} = \begin{bmatrix} 11 & 10 \\ 9 & 14 \end{bmatrix}

\begin{bmatrix} 1 & 3 & 2 \\ 4 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} 1 \\ 0 \\ 5 \end{bmatrix} = \begin{bmatrix} 11 \\ 9 \end{bmatrix}

\begin{bmatrix} 1 & 3 & 2 \\ 4 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} 3 \\ 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 10 \\ 14 \end{bmatrix}

Linear Algebra

Matrix - vector multiplication

\begin{bmatrix} . & . & . & . \\ . & . & . & . \\ . & . & . & . \end{bmatrix} \times \begin{bmatrix} . & . & . \\ . & . & . \\ . & . & . \\ . & . & . \end{bmatrix} = \begin{bmatrix} . & . & . \\ . & . & . \\ . & . & . \end{bmatrix}

\( A \in \mathbb{R^{m \times n}} \)

The \( i^{th} \) column of matrix \( C \) is obtained by multiplying \( A \) with the \( i^{th} \) column of \( B \) (for \( i = 1,2,...,o \) )

\( B \in \mathbb{R^{n \times o}} \)

\( C \in \mathbb{R^{m \times o}} \)

Linear Algebra

Matrix multiplication: some properties

Given \( A \in \mathbb{R^{m \times n}} \), \( B \in \mathbb{R^{n \times o}} \) and \( C \in \mathbb{R^{o \times p}} \)

In general, \( A \times B \neq B \times A \) (not commutative)

\( (A \times B) \times C = A \times (B \times C) \) (associative)

Linear Algebra

Identity Matrix

Denoted \( I \) (or \( I_{n \times n} \) or \( I_n \))

Examples:

I_{2 \times 2} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

I_{3 \times 3} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

I_{4 \times 4} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}

For any matrix \( A \),

\( A \times I = I \times A = A \)

Linear Algebra

Tranpose

Example:

A = \begin{bmatrix} 1 & 3 & 2 \\ 4 & 0 & 1 \end{bmatrix}

B = A^T = \begin{bmatrix} 1 & 4 \\ 3 & 0 \\ 2 & 1 \end{bmatrix}

Definition:

Let \( A \) be a \( n \times m \) matrix, and let \( B = A^T \).

Then \( B \) is a \( m \times n \) matrix and \( B_{i,j} = A_{j,i} \)

Data Science 101

Outline

Course outline

Course outline

Deadlines

Kaggle / Anaconda

Data Science

Data Science = Machine Learning in practice

Machine Learning definition

Machine Learning definition

Supervised Learning

Supervised Learning

Supervised Learning

Supervised Learning

Supervised Learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning

Statistics

Machine Learning

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Data Science 101.0

Data Science 101.0

Yann Carbonne

Data Science 101

Outline

Course outline

Course outline

Deadlines

Kaggle / Anaconda

Data Science

Data Science = Machine Learning in practice

Machine Learning definition

Machine Learning definition

Supervised Learning

Supervised Learning

Supervised Learning

Supervised Learning

Supervised Learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning

Statistics

Machine Learning

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Data Science 101.0

More from Yann Carbonne