ML for physical and natural scientists 2023 9

dr.federica bianco | fbb.space |    fedhere |    fedhere

Deep Learning 2 - Convolutional NNs

this slide deck:

https://slides.com/federicabianco/mlpns23_9

Machine Learning basic concepts
- interpretability
- parameters vs hyperparameters
- supervised/unsupervised

CART methods
Clustering methods
Neural Networks

Neural Networks
- the brain connection
- perceptron
- learning
- activation functions
- shallow nets
- deep nets architecture
- back-propagation
- preprocessing and whitening (minibatch)

neural networks

recap

0

Perceptrons are linear classifiers: makes its predictions based on a linear predictor function

combining a set of weights (=parameters) with the feature vector.

y ~= ~\sum_i w_ix_i ~+~ b

y ~= ~wx ~+~ b

y ~= ~f(\sum_i w_ix_i ~+~ b)

.

x_1

x_2

x_N

+b

f

output

f

activation function

weights

w_i

bias

b

w_2

w_1

w_N

recap

0 perceptrons

multilayer perceptron

x_2

x_3

output

Fully connected: all nodes go to all nodes of the next layer.

input layer

hidden layer

output layer

1970: multilayer perceptron architecture

x_1

recap

3 multilayer perceptron

x_2

x_3

output

x_1

layer of perceptrons

b_1

b_2

b_3

b_4

b

w_{21}

w_{22}

w_{23}

w_{24}

recap

multilayer perceptron

x_2

x_3

output

Fully connected: all nodes go to all nodes of the next layer.

layer of perceptrons

w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + b1

w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + b2

w_{31}x_1 + w_{32}x_2 + w_{33}x_3 + b3

w_{41}x_1 + w_{42}x_2 + w_{43}x_3 + b4

x_1

w: weight

sets the sensitivity of a neuron

b: bias:

up-down weights a neuron

learned parameters

recap

multilayer perceptron

what we are doing is exactly a series of matrix multiplictions.

multilayer perceptron

x_2

x_3

output

Fully connected: all nodes go to all nodes of the next layer.

layer of perceptrons

f(w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + b1)

f(w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + b1)

f(w_{31}x_1 + w_{32}x_2 + w_{33}x_3 + b1)

f(w_{41}x_1 + w_{42}x_2 + w_{43}x_3 + b1)

x_1

w: weight

sets the sensitivity of a neuron

b: bias:

up-down weights a neuron

f: activation function:

turns neurons on-off

recap

activation functions

recap

CNN

1

Convolutional Neural Nets

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

CNN

1a

Convolution

Convolution

convolution is a mathematical operator on two functions

f and g

that produces a third function

f x g

expressing how the shape of one is modified by the other.

o

Convolution Theorem

f * g= \mathcal{F}^{-1}\big\{\mathcal{F}\{f\}\cdot\mathcal{F}\{g\}\big\}

\mathcal{F}

fourier transform

{\displaystyle {\begin{aligned}F(\nu )&=\int _{\mathbb {R} ^{n}}f(x)e^{-2\pi ix\cdot \nu }\,dx,\\ G(\nu )&=\int _{\mathbb {R} ^{n}}g(x)e^{-2\pi ix\cdot \nu }\,dx,\end{aligned}}}

two images.

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1

-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

-1	-1	1
-1	1	-1
1	-1	-1

feature maps

1

convolution

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*-1)+(1*1)+(-1*-1)\\ (-1*-1)+(-1*-1)+(1*1)\\ = 7

7

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*1)+(-1*1)+(-1*1)\\ (-1*-1)+(-1*1)+(-1*1)\\ = -3

7	-3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3

=

input layer

feature map

convolution layer

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3	5	-3
3	-3	7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

and it is reminiscent of the original layer

7

5

7

7	-3	3
-3	5	-3
3	-3	7

=

7

Convolve with different feature: each neuron is 1 feature

CNN

1b

ReLu

7	-3	3
-5	5	-3
-6	-1	7

7

5

7

ReLu: normalization that replaces negative values with 0's

7	0	3
0	5	0
3	0	7

7

5

7

1c

Max-Pool

CNN

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5	7

MaxPooling: reduce image size & generalizes result

By reducing the size and picking the maximum of a sub-region we make the network less sensitive to specific details

CNN

final layer:

the final layer is fully connected

x

O

last hidden layer

output layer

Stack multiple convolution layers

deep dreams

what is happening in DeepDream?

Deep Dream (DD) is a google software, a pre-trained NN (originally created on the Cafe architecture, now imported on many other platforms including tensorflow).

The high level idea relies on training a convolutional NN to recognize common objects, e.g. dogs, cats, cars, in images. As the network learns to recognize those objects is develops its convolutional layers to pick out "features" of the NN, like lines at a certain orientations, circles, etc.

Each neuron, is a filters: e.g. edge finders.

The DeepDream software runs this NN on an image you give it, and it loops on some hidden layers, thus "manifesting" the things it knows how to recognize in the image. The output of an inner layer (input of the next inner layer) is called a "feature map". We are taking a peek into the feature maps of a deep neural network trained to recognized common onbjects.

back

propagation

2

Deep Learning

excellent blog post on BP: http://colah.github.io/posts/2015-08-Backprop/

back-propagation

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

First, compute the linear function for state of neuron,

\vec{x}~=~ \vec{y}W

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

First, compute the linear function for state of neuron,

x_{j}~=~\sum_i y_{i}w_{ji}

x

y

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

First, compute the linear function for state of neuron,

x_{j}~=~\sum_i y_{i}w_{ji}

x

y

L_2 = \sum_j{}(y_j-d_j)^2

minimize L2 by changing w iteratively

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

Then, calculate the output of that layer by using a non-linear function.

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

x

y

activation

function

(Sigmoid)

to perform classification

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

Then, calculate the output of that layer by using non-linear function.

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

.

x_1

x_N

+b

f

w_2

w_1

w_N

sigmoid

f

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

w_2

output

.

x_1

x_2

x_N

+b

\vec{y} = \vec{x}W + b

Any linear model:

w_2

w_1

w_N

y

y : prediction

ytrue : target

Error: e.g.

L_2~=~(y - y_\mathrm{true})^2

intercept

slope

L2

x

Find the best parameters by finding the minimum of the L2 hyperplane

at every step look around and choose the best direction

back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

.

x_1

x_N

f

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

+b

f

w_2

output

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

f: activation function:

turns neurons on-off

w: weight

sets the sensitivity of a neuron

b: bias:

up-down weights a neuron

In a CNN these layers would not be fully connected except the last one

.

x_1

x_2

x_N

+b

f

\vec{y} = f(\vec{x}W + b)

perceptron or

shallow NN

w_2

w_1

w_N

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

input layer

hidden layer

output layer

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training models with this many parameters requires a lot of care:

. defining the metric

. optimization schemes

. training/validation/testing sets

But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training models with this many parameters requires a lot of care:

. defining the metric

. optimization schemes

. training/validation/testing sets

But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a DNN

feed data forward through network and calculate cost metric

for each layer, calculate effect of small changes on next layer

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

think of applying just gradient to a function of a function of a function... use:

1) partial derivatives, 2) chain rule

http://neuralnetworksanddeeplearning.com/chap2.html

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a DNN

overfitting

2

Minibatch

&

Dropout

If one updates model parameters after processing the whole training data (i.e., epoch), it would take too long to get a model update in training, and the entire training data probably won’t fit in the memory.
If one updates model parameters after processing every instance (i.e., stochastic gradient descent), model updates would be too noisy, and the process is not computationally efficient.
Therefore, minibatch gradient descent is introduced as a trade-off between fast model updates (memory efficiency) and accurate model updates (computational efficiency).

Split your training set into many smaller subsets and train on each small set separately

overfitting

https://www.quora.com/What-is-a-minibatch-in-a-neural-network

overfitting

Dropout

Artificially remove some neurons for different minibatches to avoid overfitting

output

key concepts

Architecture components: neurons, activation function

basically each neuron is a multivariate regression with an activation function that turns the output into a probability
changing the weights and biases in the linear regression gives different results

Single layer NN: perceptrons

perceptrons were developed in the 50s but a long time passed since then till people figured out how to build complex layered architectures and especially how to train them

Deep NN:

DNN are multi-layer architectures of neurons. They can be fully connected (each neuron goes to each neuron of the next layer) or not (a neuron goes only to some neurons in the next layer)
DNN have a lot of parameters (thousands!) which makes the interpretability and feature extraction of NN difficult.

key concepts

Convolutional NN

convolutional NN are DNN with three types of layers:
- convolutional layers: run filters through an image to detect features like edges or colors
- maxpool layers: decrease the size of the previous layer outputs and removes some details
- ReLU : rectified linear units: normalizes the output of conv layers so that it is all positive (sets negatives to 0)
CNNs are great for the stud of structure in large datasets (images are large datasets)

Training an NN:

most ML methods are trained by gradient descent: change weights and biases based on the derivative of the loss (or cost) function
DLL are difficult to train cause of the layer structure
backpropagation propagates changes to the weights to the entire NN
Minibatch: split the training set into many (100s!) subset and use these to train the NN
Dropout: set some neurons to zero to avoid overfitting

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

architecture - wide networks tend to overfit, deep networks are hard to train
number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices
loss function - needs to be appropriate to the task, e.g. classification vs regression
activation functions - needs to be consistent with the loss function
optimization scheme - needs to be appropriate to the task and data
learning rate in optimization - balance speed and accuracy
batch size - smaller batch size is faster but leads to overtraining

5

Advanced issues found

▲

1

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

accuracy comparison

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

accuracy comparison

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

batch size

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

architecture - wide networks tend to overfit, deep networks are hard to train
number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices
loss function - needs to be appropriate to the task, e.g. classification vs regression
activation functions - needs to be consistent with the loss function
optimization scheme - needs to be appropriate to the task and data
learning rate in optimization - balance speed and accuracy
batch size - smaller batch size is faster but leads to overtraining

5

Advanced issues found

▲

1

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

always check your loss function! it should go down smoothly and flatten out at the end of the training.

not flat? you are still learning!

too flat? you are overfitting...

loss (gallery of horrors)

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

jumps are not unlikely (and not necessarily a problem) if your activations are discontinuous (e.g. relu)

when you use validation you are introducing regularizations (e.g. dropout) so the loss can be smaller than for the training set

loss and learning rate (not that the appropriate learning rate depends on the chosen optimization scheme!)

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

loss	good for	activation last layer	size last layer
mean_squared_error	regression	linear	one node
mean_absolute_error	regression	linear	one node
mean_squared_logarithmit_error	regression	linear	one node
binary_crossentropy	binary classification	sigmoid	one node
categorical_crossentropy	multiclass classification	sigmoid	N nodes
Kullback_Divergence	multiclass classification, probabilistic inerpretation	sigmoid	N nodes

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

On the interpretability of DNNs

https://distill.pub/2020/circuits/zoom-in/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

YOLO and R-CNN

Object detection

Naive model: we took different region of the image and measured the probability of presence of the object in that region

YOLO and R-CNN

Object detection

Problem: we had to search the whole image which is time consuming, we could only find 1 kind of object at 1 scale

YOLO and R-CNN

Object detection

Problem: we had to search the whole image which is time consuming, we could only find 1 kind of object at 1 scale

YOLO and R-CNN

Object detection

What if you do not know what is in the mage?

Final Dense layer has undefined size (one per kind of object in the region)

Objects can have different scale or axis ration: how many regions can you search before the problem blows up computationally??

R-CNN

Girshick et al. 2013

Extract 2000 regions from the image "region proposals."

Feature Extraction CNN produces a 4096-dimensional feature vector in an output dense layer

SVM classify the presence of the object within that candidate region proposal.

1. Generate initial sub-segmentation, we generate many candidate     regions
2. Use greedy algorithm to recursively combine similar regions into larger ones 
3. Use the generated regions to produce the final candidate region proposals

TOO SLOW (47 sec to test 1 image)

Fast R-CNN

Girshick et al . 2015

Use a CNN to generate convolutional feature maps

Use Selective Search Algorithm to tdentify the RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

Fast R-CNN

Girshick et al . 2015

Use a CNN to generate convolutional feature maps

Use Selective Search Algorithm to tdentify the RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

Faster R-CNN

Ren et al. 2015

Use a CNN to generate convolutional feature maps

Use CNN to predict RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

Faster R-CNN

Ren et al. 2015

Use a CNN to generate convolutional feature maps

Use CNN to predict RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

Yolo

Redmon et al 2016

What if you looked at the whole image instead of RoIs in the image??

Split an image into a SxS grid

For each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities.

CNN outputs probability that BB has an object (+ offset)

High prob BBs are classified

Redmon et al 2016

Labling tools

https://www.v7labs.com/blog/best-image-annotation-tools

resources

Neural Network and Deep Learning

an excellent and free book on NN and DL

http://neuralnetworksanddeeplearning.com/index.html

History of NN

https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html

resources

Gradient Descent

https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html

Backpropagation

http://colah.github.io/posts/2015-08-Backprop/

Physics Informed NN

PiNN

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

lots of data yet not enough for entirely automated decision making

complex theory that cannot be solved analytically

combine it with some theory

PiNN

General conservation law

u (t) + \frac{\partial\mathcal{f}(u)}{\partial x} = 0\\

\mathcal{f}(u) = a\cdot u\\

e.g. flux function (linear)

Burgers eq (non-linear)

\mathcal{f}(u) = \frac{1}{2}u^2\\

u:[0,T] \times D =>\mathbb{R}

\partial_t u (t,x) + \mathcal{N}[u](t,x) = 0\\ u(0,x) = u_0(x)

(t,x) \in (0,T] x D

\mathcal{N}[\cdot]

is a nonlinear differential operator

PiNN

Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Existence and uniqueness of solutions

A fundamental question for any PDE is the existence and uniqueness of a solution for given boundary conditions. open problem of existence (and smoothness) of solutions to the Navier–Stokes equations is one of the seven Millennium Prize problems in mathematics.

https://en.wikipedia.org/wiki/Nonlinear_partial_differential_equation

PiNN

Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Linear approximation

The solutions in a neighborhood of a known solution can sometimes be studied by linearizing the PDE around the solution. This corresponds to studying the tangent space of a point of the moduli space of all solutions.

https://en.wikipedia.org/wiki/Nonlinear_partial_differential_equation

PiNN

Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Exact solutions

It is often possible to write down some special solutions explicitly in terms of elementary functions (though it is rarely possible to describe all solutions like this). One way of finding such explicit solutions is to reduce the equations to equations of lower dimension, preferably ordinary differential equations, which can often be solved exactly.

https://en.wikipedia.org/wiki/Nonlinear_partial_differential_equation

PiNN

Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Numerical solutions

Numerical solution on a computer is almost the only method that can be used for getting information about arbitrary systems of PDEs. There has been a lot of work done, but a lot of work still remains on solving certain systems numerically, especially for the Navier–Stokes and other equations related to weather prediction.

https://en.wikipedia.org/wiki/Nonlinear_partial_differential_equation

PiNN

Burgers equation:

second order non-linear PDE

u:[0,T] \times D =>\mathbb{R}

u(0,x) = u_0(x)

(t,x) \in (0,T] x D

spatial coordinate

temporal coordinate

speef of fluid at x,t

viscosity

x:\\ t:\\ u(x,t):\\ \nu:

\partial_t u + u \, \partial_x u - \nu \, \partial_{xx} u = 0,\\

Applications of Burgers eq:

shock weave formation, turbulence, the weather problem, traffic flow and acoustic transmission

PiNN

\partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u = 0,\\

(t,x) \in (0,1] \times (-1,1),\\ x \in [-1,1],\\ t \in (0,1]

Domain

Boundary Conditions

u(0,x) = - \sin(\pi \, x), \\ u(t,-1) = u(t,1) = 0.

How to solve analytically

https://www.youtube.com/watch?v=5ZrwxQr6aV4

Burgers equation:

second order non-linear PDE

PiNN

\partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u = 0,\\

How to solve analytically

https://www.youtube.com/watch?v=5ZrwxQr6aV4

Burgers equation:

second order non-linear PDE

PiNN

Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

input layer

PiNN

Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

Provide the physical constraint: make sure the solution satisfies the PDE

???

PiNN

Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

PiNN

Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

\mathrm{loss} = L2 + PDE =\\ \sum(u_\theta - u)^2 + \\ (\partial_t u_\theta + u_\theta \, \partial_x u_\theta - (0.01/\pi) \, \partial_{xx} u_\theta)^2\\

PiNN

Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

\mathrm{loss} = L2 + PDE =\\ \sum(u_\theta - u)^2 + \\ \partial_t u_\theta + u_\theta \, \partial_x u_\theta - (0.01/\pi) \, \partial_{xx} u_\theta\\

https://github.com/fedhere/MLPNS_FBianco/blob/main/PINN/PINN_Burgers.ipynb

ML for physical and natural scientists 2023 9

neural networks

recap

0

recap

0

perceptrons

multilayer perceptron

recap

3

multilayer perceptron

recap

multilayer perceptron

recap

multilayer perceptron

multilayer perceptron

recap

activation functions

recap

CNN

1

Brain Programming and the Random Search in Object Categorization

CNN

1a

CNN

1b

1c

CNN

CNN

final layer:

the final layer is fully connected

deep dreams

deep dreams

what is happening in DeepDream?

back

propagation

2

back-propagation

back-propagation

back-propagation

back-propagation

back-propagation

back-propagation

back-propagation

back-propagation

back-propagation

overfitting

2

overfitting

overfitting

key concepts

key concepts

resources

resources

PiNN

PiNN

PiNN

PiNN

PiNN

PiNN

Existence and uniqueness of solutions

PiNN

Linear approximation

PiNN

Exact solutions

PiNN

Numerical solutions

PiNN

PiNN

PiNN

PiNN

PiNN

PiNN

PiNN

PiNN

PiNN

machine learning for natural and physical scientists 2023 9

More from federica bianco