federica bianco
astro | data science | data for good
dr.federica bianco | fbb.space | fedhere | fedhere
Artifiacial Neural networks
0
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
remote sensing
survey science
instrumental design and development
data retrieval
...
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
data types
identify correlation
missing variable
...
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
Imputation
Scaling and
whitening
tokenizing
...
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
what is the goal:
statistical analysis
anomaly detection
prediction
structure identification
....
what is the task:
regression
classification
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
SciPy
The discipline that deals with extraction of information from data in a specific domain context, from data collection through inference
(Problem Identification and Planning)
Data driven models for exploration of structure and prediction that learn parameters from data.
Machine Learning
y
x
x
y
Reinforcement Learning
Active Learning
unupervised learning supervised learning
Data driven models for exploration of structure, prediction that learn parameters from data.
unupervised ------ supervised
set up: All features known for all observations
Goal: explore structure in the data
- data compression
- understanding structure
- anomaly detection
Algorithms: kMeans clustering, DBSCAN, Agglomerative clustering
x
y
Machine Learning
Data driven models for exploration of structure, prediction that learn parameters from data.
unupervised ------ supervised
set up: All features known for a sunbset of the data; one feature cannot be observed for the rest of the data
Goal: predicting missing feature
- classification
- regression
Algorithms: regression, (SVM), Classification and Regression Tree methods, k-nearest neighbors, neural networks, (...)
x
y
Machine Learning
unupervised ------ supervised
unupervised ------ supervised
Machine Learning
set up: All features known for a sunbset of the data; one feature cannot be observed for the rest of the data
Goal: predicting missing feature
- classification
- regression
Algorithms: regression, (SVM), Classification and Regression Tree methods, k-nearest neighbors, neural networks, (...)
set up: All features known for all observations
Goal: explore structure in the data
- data compression
- understanding structure
- anomaly detection
Algorithms: kMeans clustering, DBSCAN, Agglomerative clustering
Learning relies on the definition of a loss function
learning type | loss / target |
---|---|
unsupervised | intra-cluster variance / inter cluster distance |
supervised | distance between prediction and truth |
Machine Learning
model parameters are learned by calculating a loss function for diferent parameter sets and trying to minimize loss (or a target function and trying to maximize)
e.g. supervised
L1 = |target - prediction|
Learning relies on the definition of a loss function
Machine Learning
supervised and unsupervised
e.g. unsupervised
Inertia =
model parameters are learned by calculating a loss function for diferent parameter sets and trying to minimize loss (or a target function and trying to maximize)
e.g.
L1 = |target - prediction|
Learning relies on the definition of a loss function
Machine Learning
Supervised Learning tasks
regression ------ classification
Target Variable: CONTINUOUS
(age, income, temperature...)
Target Variable: Categorical
(color, shape, income class...)
Interaction with the environment builds a reward function
Machine Learning
reinforcement
The goal of the agent is to maximize a cumulative reward signal over time
The objective is not to predict a specific output but to learn a policy or strategy that maximizes the cumulative reward over time.
Minkowski distance
Jaccard similarity
Great circle distance
The definition of a loss function requires the definition of distance or similarity
Machine Learning
mean square error
Distances are at the heart of ML
mean absolute error
mean square error
hyperparameters
mean absolute error
What are the symptoms
How can we fix it?
model performance (accuracy)
model performance (accuracy)
tree depth
tree depth
ANN training epochs
what is the simplest classifier you can build for this dataset ?
what is the accuracy?
x
y
If your dataset is imbalanced (more of one class than the other)
your model will learn that it is better to guess the most common class
this will contaminate the prediction
NN are a vast topics and we only have 2 weeks!
Some FREE references!
michael nielsen
better pedagogical approach, more basic, more clear
ian goodfellow
mathematical approach, more advanced, unfinished
michael nielsen
better pedagogical approach, more basic, more clear
Neural Networks
1
origins
1943
M-P Neuron McCulloch & Pitts 1943
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how neurons might work. In order to describe how neurons in the brain might work, they modeled a simple neural network using electrical circuits.
Neurons (nerve cells) are connected into a network: dendrites receive incoming messages from other nerve cells; axons carry outgoing signals,
Neurons communicates with other cells through electrical impulses releasing chemicals that pass through the synapse, the gap between two nerve cells, and attach to receptors on the receiving cell.
1943
M-P Neuron McCulloch & Pitts 1943
1943
M-P Neuron McCulloch & Pitts 1943
M-P Neuron
1943
M-P Neuron
its a classifier
M-P Neuron McCulloch & Pitts 1943
M-P Neuron
1943
M-P Neuron McCulloch & Pitts 1943
M-P Neuron
1943
if is Bool (True/False)
what value of
corresponds to logical AND?
M-P Neuron McCulloch & Pitts 1943
if x1 and x2 and x3
The perceptron algorithm : 1958, Frank Rosenblatt
1958
Perceptron
The perceptron algorithm : 1958, Frank Rosenblatt
1958
Perceptron
The perceptron algorithm : 1958, Frank Rosenblatt
1958
Perceptron
The perceptron algorithm : 1958, Frank Rosenblatt
.
.
.
output
weights
bias
linear regression:
1958
Perceptron
Perceptrons are linear classifiers: makes its predictions based on a linear predictor function
combining a set of weights (=parameters) with the feature vector.
The perceptron algorithm : 1958, Frank Rosenblatt
x
y
1958
Perceptrons are linear classifiers: makes its predictions based on a linear predictor function
combining a set of weights (=parameters) with the feature vector.
The perceptron algorithm : 1958, Frank Rosenblatt
x
y
1958
.
.
.
output
activation function
weights
bias
perceptron
The perceptron algorithm : 1958, Frank Rosenblatt
Perceptrons are linear classifiers: makes its predictions based on a linear predictor function
combining a set of weights (=parameters) with the feature vector.
The perceptron algorithm : 1958, Frank Rosenblatt
output
activation function
weights
bias
sigmoid
.
.
.
Perceptrons are linear classifiers: makes its predictions based on a linear predictor function
combining a set of weights (=parameters) with the feature vector.
The perceptron algorithm : 1958, Frank Rosenblatt
output
activation function
weights
bias
.
.
.
Perceptron
The perceptron algorithm : 1958, Frank Rosenblatt
Perceptron
The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.
The embryo - the Weather Buerau's $2,000,000 "704" computer - learned to differentiate between left and right after 50 attempts in the Navy demonstration
July 8, 1958
The perceptron algorithm : 1958, Frank Rosenblatt
Perceptron
July 8, 1958
...2020
ADELINE and MADELINE 1962 - B. Widrow & M. Hoff
Weight Change = (Pre-Weight line value)(Error / (Number of Inputs)).
The perceptron algorithm : 1958, Frank Rosenblatt
output
sigmoid
.
.
.
learning from here
ADELINE and MADELINE 1962 - B. Widrow & M. Hoff
learning from here
Perceptrons are linear classifiers: makes its predictions based on a linear predictor function
combining a set of weights (=parameters) with the feature vector.
x
y
Problem:
can only learn linearly separable patterns
... time went by... 2+ DECADES
ADELINE and MADELINE 1962 - B. Widrow & M. Hoff
2
Problem:
Single-layer perceptrons are only capable of learning linearly separable patterns.
1943
AND
OR
XOR
M-P Neuron McCulloch & Pitts 1943
if x1 and x2 and x3
3
output
layer of perceptrons
output
input layer
hidden layer
output layer
1970: multilayer perceptron architecture
Fully connected: all nodes go to all nodes of the next layer.
output
layer of perceptrons
output
layer of perceptrons
layer of perceptrons
output
layer of perceptrons
output
Fully connected: all nodes go to all nodes of the next layer.
layer of perceptrons
output
Fully connected: all nodes go to all nodes of the next layer.
layer of perceptrons
w: weight
sets the sensitivity of a neuron
b: bias:
up-down weights a neuron
learned parameters
output
Fully connected: all nodes go to all nodes of the next layer.
layer of perceptrons
w: weight
sets the sensitivity of a neuron
b: bias:
up-down weights a neuron
f: activation function:
turns neurons on-off
what we are doing is exactly a series of matrix multiplictions.
hyperparameters of DNN
4
output
how many parameters?
input layer
hidden layer
output layer
hidden layer
output
input layer
hidden layer
output layer
hidden layer
how many hyperparameters?
output
input layer
hidden layer
output layer
hidden layer
how many hyperparameters?
GREEN: architecture hyperparameters
output
input layer
hidden layer
output layer
hidden layer
how many hyperparameters?
GREEN: architecture hyperparameters
RED: training hyperparameters
output
input layer
hidden layer
output layer
hidden layer
4x3+3
3x1+1
4
how many parameters?
3x4+4
output
input layer
hidden layer
output layer
hidden layer
4x3+3
3x1+1
4
how many parameters?
3x4+4
35
wx27
bx8
training DNN
5
Fully connected: all nodes go to all nodes of the next layer.
1986: Deep Neural Nets
f: activation function:
turns neurons on-off
w: weight
sets the sensitivity of a neuron
b: bias:
up-down weights a neuron
In a CNN these layers would not be fully connected except the last one
Seminal paper
Y. LeCun 1998
.
.
.
Any linear model:
y : prediction
ytrue : target
Error: e.g.
intercept
slope
L2
x
Find the best parameters by finding the minimum of the L2 hyperplane
at every step look around and choose the best direction
how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??
.
.
.
output
Training models with this many parameters requires a lot of care:
. defining the metric
. optimization schemes
. training/validation/testing sets
But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.
define a cost function, e.g.
Training models with this many parameters requires a lot of care:
. defining the metric
. optimization schemes
. training/validation/testing sets
But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.
define a cost function, e.g.
Training a DNN
feed data forward through network and calculate cost metric
for each layer, calculate effect of small changes on next layer
how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??
think of applying just gradient to a function of a function of a function... use:
1) partial derivatives, 2) chain rule
define a cost function, e.g.
Training a DNN
Deep Neural Net are not some fancy-pants methods, they are just linear models with a bunch of parameters
Because they have many parameters they are difficult to "interpret" (no easy feature extraction)
tha is ok becayse they are prediction machines
Deep Dream (DD) is a google software, a pre-trained NN (originally created on the Cafe architecture, now imported on many other platforms including tensorflow).
The high level idea relies on training a convolutional NN to recognize common objects, e.g. dogs, cats, cars, in images. As the network learns to recognize those objects is developes its layers to pick out "features" of the NN, like lines at a cetrain orientations, circles, etc.
The DD software runs this NN on an image you give it, and it loops on some layers, thus "manifesting" the things it knows how to recognize in the image.
@akumadog
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features
Neural Network and Deep Learning
an excellent and free book on NN and DL
http://neuralnetworksanddeeplearning.com/index.html
History of NN
https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html
By federica bianco
neural networks