ML for physical and natural scientists 2023 9

dr.federica bianco | fbb.space |    fedhere |    fedhere 

 Deep Learning 2 - Convolutional NNs

  • Machine Learning basic concepts
    • interpretability
    • parameters vs hyperparameters
    • supervised/unsupervised


  • CART methods
  • Clustering methods
  • Neural Networks
  • Neural Networks
    • the brain connection
    • perceptron
    • learning
    • activation functions
    • shallow nets
    • deep nets architecture
    • back-propagation
    • preprocessing and whitening (minibatch)

 

neural networks

recap

 

0

Perceptrons are linear classifiers: makes its predictions based on a linear predictor function

combining a set of weights (=parameters) with the feature vector.

y ~= ~\sum_i w_ix_i ~+~ b
y ~= ~wx ~+~ b
y ~= ~f(\sum_i w_ix_i ~+~ b)

.

.

.

 

x_1
x_2
x_N
+b
f

output

f

activation function

weights

w_i

bias

b
w_2
w_1
w_N

recap

 

0

perceptrons

multilayer perceptron

x_2
x_3

output

Fully connected: all nodes go to all nodes of the next layer.

input layer

hidden layer

output layer

1970: multilayer perceptron architecture

x_1

recap

 

3

multilayer perceptron

x_2
x_3

output

x_1

layer of perceptrons

b_1
b_2
b_3
b_4
b
w_{21}
w_{22}
w_{23}
w_{24}

recap

 

multilayer perceptron

x_2
x_3

output

Fully connected: all nodes go to all nodes of the next layer.

layer of perceptrons

w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + b1
w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + b2
w_{31}x_1 + w_{32}x_2 + w_{33}x_3 + b3
w_{41}x_1 + w_{42}x_2 + w_{43}x_3 + b4
x_1

w: weight

sets the sensitivity of a neuron

 

b: bias:

up-down weights a neuron

 

 

learned parameters

recap

 

multilayer perceptron

what we are doing is exactly a series of matrix multiplictions. 

multilayer perceptron

x_2
x_3

output

Fully connected: all nodes go to all nodes of the next layer.

layer of perceptrons

f(w_{11}x_1 + w_{12}x_2 + w_{13}x_3 + b1)
f(w_{21}x_1 + w_{22}x_2 + w_{23}x_3 + b1)
f(w_{31}x_1 + w_{32}x_2 + w_{33}x_3 + b1)
f(w_{41}x_1 + w_{42}x_2 + w_{43}x_3 + b1)
x_1

w: weight

sets the sensitivity of a neuron

 

b: bias:

up-down weights a neuron

 

 

f: activation function:

turns neurons on-off

 

recap

 

activation functions

recap

 

CNN

1

Convolutional Neural Nets

@akumadog

Brain Programming and the Random Search in Object Categorization

 

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

CNN

1a

Convolution

Convolution

convolution is a mathematical operator on two functions

f and g

that produces a third function  

f x g

expressing how the shape of one is modified by the other.

o

Convolution Theorem

f * g= \mathcal{F}^{-1}\big\{\mathcal{F}\{f\}\cdot\mathcal{F}\{g\}\big\}
\mathcal{F}

fourier transform

{\displaystyle {\begin{aligned}F(\nu )&=\int _{\mathbb {R} ^{n}}f(x)e^{-2\pi ix\cdot \nu }\,dx,\\ G(\nu )&=\int _{\mathbb {R} ^{n}}g(x)e^{-2\pi ix\cdot \nu }\,dx,\end{aligned}}}

two images. 

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1

1

1

1

1

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
-1 -1 1
-1 1 -1
1 -1 -1

feature maps

1

1

1

1

1

convolution

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*-1)+(1*1)+(-1*-1)\\ (-1*-1)+(-1*-1)+(1*1)\\ = 7
7

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*1)+(-1*1)+(-1*1)\\ (-1*-1)+(-1*1)+(-1*1)\\ = -3
7 -3

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
? ?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
? ?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
? ?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
? ?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
? ?

=

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
-3

=

input layer

feature map

convolution layer

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1
7 -3 3
-3 5 -3
3 -3 7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

1

1

1

1

1

-1 -1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1
-1 -1 -1
-1 -1 -1 -1 -1
1 -1 -1
-1 1 -1
-1 -1 1

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

and it is reminiscent of the original layer

7

5 

7

7 -3 3
-3 5 -3
3 -3 7

=

7

7

Convolve with different feature: each neuron is 1 feature

CNN

1b

ReLu

7 -3 3
-5 5 -3
-6 -1 7

7

5 

7

ReLu: normalization that replaces negative values with 0's

7 0 3
0 5 0
3 0 7

7

5 

7

1c

Max-Pool

CNN

MaxPooling: reduce image size, generalizes result

7 0 3
0 5 0
3 0 7

7

5 

7

MaxPooling: reduce image size, generalizes result

7 0 3
0 5 0
3 0 7

7

5 

7

2x2 Max Poll

7 5

MaxPooling: reduce image size, generalizes result

7 0 3
0 5 0
3 0 7

7

5 

7

2x2 Max Poll

7 5
5

MaxPooling: reduce image size, generalizes result

7 0 3
0 5 0
3 0 7

7

5 

7

2x2 Max Poll

7 5
5 7

MaxPooling: reduce image size & generalizes result

 

 

By reducing the size and picking the maximum of a sub-region we make the network less sensitive to specific details

CNN

final layer:

the final layer is fully connected

x

O

last hidden layer

output layer

Stack multiple convolution layers

deep dreams

deep dreams

what is happening in DeepDream?

Deep Dream (DD) is a google software, a pre-trained NN (originally created on the Cafe architecture, now imported on many other platforms including tensorflow).

 

The high level idea relies on training a convolutional NN to recognize common objects, e.g. dogs, cats, cars, in images. As the network learns to recognize those objects is develops its convolutional layers to pick out "features" of the NN, like lines at a certain orientations, circles, etc. 

 

Each neuron, is a filters: e.g. edge finders.

 

The DeepDream software runs this NN on an image you give it, and it loops on some hidden layers, thus "manifesting" the things it knows how to recognize in the image. The output of an inner layer (input of the next inner layer) is called a "feature map".  We are taking a peek into the feature maps of a deep neural network trained to recognized common onbjects.

 

 

 

back

propagation

2

Deep Learning

 back-propagation

 back-propagation

First, compute the linear function for state of neuron,

\vec{x}~=~ \vec{y}W

 back-propagation

First, compute the linear function for state of neuron,

x_{j}~=~\sum_i y_{i}w_{ji}

x

y

 back-propagation

First, compute the linear function for state of neuron,

x_{j}~=~\sum_i y_{i}w_{ji}

x

y

L_2 = \sum_j{}(y_j-d_j)^2

minimize L2 by changing w iteratively 

 back-propagation

Then, calculate the output of that layer by using a non-linear function.

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

x

y

activation

function 

(Sigmoid)

to perform classification 

 back-propagation

Then, calculate the output of that layer by using non-linear function.

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

.

.

.

 

x_1
x_N
+b
f
w_2
w_1
w_N

sigmoid

f
w_2

output

.

.

.

 

x_1
x_2
x_N
+b
\vec{y} = \vec{x}W + b

Any linear model: 

w_2
w_1
w_N
y

y : prediction

ytrue : target

Error: e.g.

 

L_2~=~(y - y_\mathrm{true})^2

intercept

slope

L2

x

Find the best parameters by finding the minimum of the L2 hyperplane

 

at every step look around and choose the best direction

 back-propagation

 back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

.

.

.

 

x_1
x_N
f
+b
f
w_2

output

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

f: activation function:

turns neurons on-off

 

w: weight

sets the sensitivity of a neuron

 

b: bias:

up-down weights a neuron

 

In a CNN these layers would not be fully connected except the last one

 

.

.

.

 

x_1
x_2
x_N
+b
f
\vec{y} = f(\vec{x}W + b)

perceptron or

shallow NN

w_2
w_1
w_N
\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

input layer

hidden  layer

output layer

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training models with this many parameters requires a lot of care:

 

. defining the metric

. optimization schemes

. training/validation/testing sets

 

But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training models with this many parameters requires a lot of care:

. defining the metric

. optimization schemes

. training/validation/testing sets

 

But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a DNN

feed data forward through network and calculate cost metric

for each layer, calculate effect of small changes on next layer

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

 back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

think of applying just gradient to a function of a function of a function... use:

1)  partial derivatives, 2)  chain rule

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a DNN

overfitting

2

Minibatch

&

Dropout

  • If one updates model parameters after processing the whole training data (i.e., epoch), it would take too long to get a model update in training, and the entire training data probably won’t fit in the memory.
  • If one updates model parameters after processing every instance (i.e., stochastic gradient descent), model updates would be too noisy, and the process is not computationally efficient.
  • Therefore, minibatch gradient descent is introduced as a trade-off between fast model updates (memory efficiency) and accurate model updates (computational efficiency).

Split your training set into many smaller subsets and train on each small set separately

overfitting

overfitting

Dropout

Artificially remove some neurons for different minibatches to avoid overfitting

output

key concepts

 

 

Architecture components: neurons, activation function

  • basically each neuron is a multivariate regression with an activation function that turns the output into a probability
  • changing the weights and biases in the linear regression gives different results

Single layer NN: perceptrons

  • perceptrons were developed in the 50s but a long time passed since then till people figured out how to build complex layered architectures and especially how to train them

Deep NN:

  • DNN are multi-layer architectures of neurons. They can be fully connected (each neuron goes to each neuron of the next layer) or not (a neuron goes only to some neurons in the next layer)
  • DNN have a lot of parameters (thousands!) which makes the interpretability and feature extraction of NN difficult.

 

key concepts

 

 

Convolutional NN

  • convolutional NN are DNN with three types of layers: 
    • convolutional layers: run filters through an image to detect features like edges or colors
    • maxpool layers: decrease the size of the previous layer outputs and removes some details 
    • ReLU : rectified linear units: normalizes the output of conv layers so that it is all positive (sets negatives to 0)
  • CNNs are great for the stud of structure in large datasets (images are large datasets)

Training an NN:

  • most ML methods are trained by gradient descent: change weights and biases based on the derivative of the loss (or cost) function 
  • DLL are difficult to train cause of the layer structure
  • backpropagation propagates changes to the weights to the entire NN
  • Minibatch: split the training set into many (100s!) subset and use these to train the NN
  • Dropout: set some neurons to zero to avoid overfitting

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

 
  1. architecture - wide networks tend to overfit, deep networks are hard to train

     
  2. number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices
  3. loss function - needs to be appropriate to the task, e.g. classification vs regression
     
  4. activation functions - needs to be consistent with the loss function
     
  5. optimization scheme - needs to be appropriate to the task and data
     
  6. learning rate in optimization - balance speed and accuracy
     
  7. batch size - smaller batch size is faster but leads to overtraining
5
 Advanced issues found
1

An article that compars various DNNs

 

An article that compars various DNNs

 

accuracy comparison

An article that compars various DNNs

 

accuracy comparison

An article that compars various DNNs

 

batch size

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

 
  1. architecture - wide networks tend to overfit, deep networks are hard to train

     
  2. number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices
  3. loss function - needs to be appropriate to the task, e.g. classification vs regression
     
  4. activation functions - needs to be consistent with the loss function
     
  5. optimization scheme - needs to be appropriate to the task and data
     
  6. learning rate in optimization - balance speed and accuracy
     
  7. batch size - smaller batch size is faster but leads to overtraining
5
 Advanced issues found
1

What should I choose for the loss function and how does that relate to the activation functiom and optimization? 

Lots of parameters and lots of hyperparameters! What to choose?

 

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

 

always check your loss function! it should go down smoothly and flatten out at the end of the training.

not flat? you are still learning!

too flat? you are overfitting...

loss  (gallery of horrors)

jumps are not unlikely (and not necessarily a problem) if your activations are discontinuous (e.g. relu)

when you use validation you are introducing regularizations (e.g. dropout) so the loss can be smaller than for the training set

loss and learning rate (not that the appropriate learning rate depends on the chosen optimization scheme!)

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

What should I choose for the loss function and how does that relate to the activation functiom and optimization? 

loss good for activation last layer size last layer
mean_squared_error regression linear one node
mean_absolute_error regression linear one node
mean_squared_logarithmit_error ​regression linear one node
binary_crossentropy binary classification sigmoid one node
categorical_crossentropy multiclass classification sigmoid N nodes
Kullback_Divergence multiclass classification, probabilistic inerpretation sigmoid N nodes

On the interpretability of DNNs

YOLO and R-CNN

Object detection

Naive model: we took different region of the image and measured the probability of presence of the object in that region

YOLO and R-CNN

Object detection

Problem: we had to search the whole image which is time consuming, we could only find 1 kind of object at 1 scale

YOLO and R-CNN

Object detection

Problem: we had to search the whole image which is time consuming, we could only find 1 kind of object at 1 scale

YOLO and R-CNN

Object detection

What if you do not know what is in the mage?

Final Dense layer has undefined size (one per kind of object in the region)

 

Objects can have different scale or axis ration: how many regions can you search before the problem blows up computationally??

R-CNN

Girshick et al. 2013

Extract 2000 regions from the image "region proposals." 

Feature Extraction CNN produces a 4096-dimensional feature vector in an output dense layer

SVM classify the presence of the object within that candidate region proposal.

 
1. Generate initial sub-segmentation, we generate many candidate     regions
2. Use greedy algorithm to recursively combine similar regions into larger ones 
3. Use the generated regions to produce the final candidate region proposals 

TOO SLOW (47 sec to test 1 image)

Fast R-CNN

Girshick et al. 2015

Use a CNN to generate convolutional feature maps

Use Selective Search Algorithm to tdentify the RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

 

Fast R-CNN

Girshick et al. 2015

Use a CNN to generate convolutional feature maps

Use Selective Search Algorithm to tdentify the RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

 

Faster R-CNN

Ren et al. 2015

Use a CNN to generate convolutional feature maps

Use CNN to predict RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

 

Faster R-CNN

Ren et al. 2015

Use a CNN to generate convolutional feature maps

Use CNN to predict RPs and warp them into squares

Using an RoI pooling layer to reshape them into a fixed size so that it can be fed into a fully connected layer - predict box offset

Softmax layer to predict the class of the proposed

 

Yolo

Redmon et al 2016

What if you looked at the whole image instead of RoIs in the image??

 

Split an image into a SxS grid

For each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities.

CNN outputs probability that BB has an object (+ offset)

High prob BBs are classified

resources

 

resources

 

Physics Informed NN

PiNN

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

Application regime:

PiNN

-infinity - 1950's

theory driven: little data, mostly theory, falsifiability and all that...

-1980's - today

data driven: lots of data, drop theory and use associations, black-box modles

lots of data yet not enough for entirely automated decision making

complex theory that cannot be solved analytically

 

combine it with some theory

PiNN

General conservation law

u (t) + \frac{\partial\mathcal{f}(u)}{\partial x} = 0\\
\mathcal{f}(u) = a\cdot u\\

e.g. flux function (linear)

Burgers eq (non-linear)

\mathcal{f}(u) = \frac{1}{2}u^2\\
u:[0,T] \times D =>\mathbb{R}
\partial_t u (t,x) + \mathcal{N}[u](t,x) = 0\\ u(0,x) = u_0(x)
(t,x) \in (0,T] x D
\mathcal{N}[\cdot]

  is a nonlinear differential operator

PiNN

  • Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
  • Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
  • Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Existence and uniqueness of solutions

A fundamental question for any PDE is the existence and uniqueness of a solution for given boundary conditions. open problem of existence (and smoothness) of solutions to the Navier–Stokes equations is one of the seven Millennium Prize problems in mathematics.

 

PiNN

  • Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
  • Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
  • Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Linear approximation

The solutions in a neighborhood of a known solution can sometimes be studied by linearizing the PDE around the solution. This corresponds to studying the tangent space of a point of the moduli space of all solutions.

 

 

PiNN

  • Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
  • Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
  • Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Exact solutions

It is often possible to write down some special solutions explicitly in terms of elementary functions (though it is rarely possible to describe all solutions like this). One way of finding such explicit solutions is to reduce the equations to equations of lower dimension, preferably ordinary differential equations, which can often be solved exactly. 

PiNN

  • Raissi et al. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 1711.10561
  • Raissi et al. Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations. arXiv 1711.10566
  • Raissi et al. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comp. Phys. 378 pp. 686-707 DOI: 10.1016/j.jcp.2018.10.045

Non Linear PDEs are hard to solve!

Numerical solutions

Numerical solution on a computer is almost the only method that can be used for getting information about arbitrary systems of PDEs. There has been a lot of work done, but a lot of work still remains on solving certain systems numerically, especially for the Navier–Stokes and other equations related to weather prediction.

 

PiNN

Burgers equation:

second order non-linear PDE

u:[0,T] \times D =>\mathbb{R}
u(0,x) = u_0(x)
(t,x) \in (0,T] x D

spatial coordinate

temporal coordinate

speef of fluid at x,t

viscosity

x:\\ t:\\ u(x,t):\\ \nu:
\partial_t u + u \, \partial_x u - \nu \, \partial_{xx} u = 0,\\

Applications of Burgers eq:

shock weave formation, turbulence, the weather problem, traffic flow and acoustic transmission

PiNN

\partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u = 0,\\
(t,x) \in (0,1] \times (-1,1),\\ x \in [-1,1],\\ t \in (0,1]

Domain

Boundary Conditions

u(0,x) = - \sin(\pi \, x), \\ u(t,-1) = u(t,1) = 0.

How to solve analytically

 

https://www.youtube.com/watch?v=5ZrwxQr6aV4

Burgers equation:

second order non-linear PDE

PiNN

\partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u = 0,\\

How to solve analytically

 

https://www.youtube.com/watch?v=5ZrwxQr6aV4

Burgers equation:

second order non-linear PDE

PiNN

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

input layer

PiNN

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

 

  • Provide the physical constraint: make sure the solution satisfies the PDE

???

PiNN

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

 

  • Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

PiNN

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

 

  • Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

\mathrm{loss} = L2 + PDE =\\ \sum(u_\theta - u)^2 + \\ (\partial_t u_\theta + u_\theta \, \partial_x u_\theta - (0.01/\pi) \, \partial_{xx} u_\theta)^2\\

PiNN

PiNN

  • Provide training points at the boundary with calculated solution (trivial cause we have boundary conditions)

 

  • Provide the physical constraint: make sure the solution satisfies the PDE

via a modified loss function that includes residuals of the prediction and residual of the PDE

\mathrm{loss} = L2 + PDE =\\ \sum(u_\theta - u)^2 + \\ \partial_t u_\theta + u_\theta \, \partial_x u_\theta - (0.01/\pi) \, \partial_{xx} u_\theta\\

machine learning for natural and physical scientists 2023 9

By federica bianco

machine learning for natural and physical scientists 2023 9

convolutinl NN

  • 825