18/09/2018
3rd IPM Workshop, D. Vilsmeier
D. Vilsmeier (GSI)
M. Sapinski (GSI)
R. Singh (GSI)
3rd IPM Workshop
J-PARC (Tokai, Japan)
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Field of study that gives computers the ability to learn without being explicitly programmed.
- Arthur Samuel (1959)
"Classical"
approach:
+
=
Input
Algorithm
Output
Machine
Learning:
+
=
Input
Algorithm
Output
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Supervised Learning:
Unsupervised Learning:
Reinforcement Learning:
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Ideal case
Particles move on straight lines towards the detector
Real case
Trajectories are influenced by initial momenta and by interaction with beam field
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Increase of electric field
Resulting in smaller extraction times and hence smaller displacements; limit is quickly reached
Additional magnetic field
Constrains the maximal displacement to the gyroradius of the resulting motion; usually an effective measure
18/09/2018
3rd IPM Workshop, D. Vilsmeier
+ other approaches, including non-Gaussian beam shapes via iterative procedures
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Space-charge region
Detector region
ð Final motion is determined by effects in the "space-charge region"
18/09/2018
3rd IPM Workshop, D. Vilsmeier
ExB-drift
Polarization drift
Capturing
"Pure" gyro-motion
p-bunch
Electron motion
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Â
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Ideally a one-dimensional projection the the transverse beam profile is measured, but...
18/09/2018
3rd IPM Workshop, D. Vilsmeier
N-turn B-fields
Without space-charge electrons at the bunch center will perform exactly N turns for specific magnetic field strengths
Due to space-charge interaction only large field strengths are effective though
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Parameter | Range | Step size |
---|---|---|
Bunch pop. [1e11] | 1.1 -- 2.1 ppb | 0.1 ppb |
Bunch width (1Ï) | 270 -- 370 Ξm | 5 Ξm |
Bunch height (1Ï) | 360 -- 600 Ξm | 20 Ξm |
Bunch length (4Ï) | 0.9 -- 1.2 ns | 0.05 ns |
Protons
6.5 TeV
4kV / 85mm
0.2 T
Training
Validation
Testing
Used to fit the model; split size ~ 60%.
Check generalization to unseen data; split size ~ 20%.
Evaluate final model performance; split size ~ 20%.
Consider 21,021 different cases
1
2
3
ð Evaluated on grid data and randomly sampled data
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Input layer
Weights
Bias
Apply non-linearity, e.g. ReLU, Tanh, Sigmoid
Perceptron
Multi-Layer Perceptron
Inspired by the human brain, many "neurons" linked together
Map non-linearities through non-linear activation functions
18/09/2018
3rd IPM Workshop, D. Vilsmeier
IDense = partial(Dense, kernel_initializer=VarianceScaling())
# Create feed-forward network.
model = Sequential()
# Since this is the first hidden layer we also need to specify
# the shape of the input data (49 predictors).
model.add(IDense(200, activation='relu', input_shape=(49,))
model.add(IDense(170, activation='relu'))
model.add(IDense(140, activation='relu'))
model.add(IDense(110, activation='relu'))
# The network's output (beam sigma). This uses linear activation.
model.add(IDense(1))
model.compile(
optimizer=Adam(lr=0.001),
loss='mean_squared_error'
)
model.fit(
x_train, y_train,
batch_size=8, epochs=100, shuffle=True,
validation_data=(x_val, y_val)
)
Fully-connected
feed-forward
network
with ReLU
activation
function
D. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv:1412.6980, 2014
After each epoch compute loss on validation data in order to prevent "overfitting"
Batch learning
ð Iterate through training set multiple
times (= epochs)
ð Weight updates are performed in batches (of training samples)
keras
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Universal approximation theorem
Every finite continuous "target" function can be approximated with arbitrarily small error by feed-forward network with single hidden layer
[corresponding Cybenko 1989; Hornik 1991]
\(n\) hidden units
activation function
\(d\) - dimensional domain
parameters to be "optimized"
Proof of existence, i.e. no universal optimization algorithm exists ð "No free lunch theorem"
Works on compact subsets of \(\mathbb{R}^d\)
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Tested also other machine learning algorithms:
âĒ Linear regression (LR)
âĒ Kernel ridge regression (KRR)
âĒ Support vector machine (SVR)
Multi-layer perceptron (= ANN)
Very good results on simulation data ð below 1% accuracy
Results are without consideration of noise on profile data
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Linear regression model
no noise on training data
similar noise on training data
Linear regression amplifies noise in predictions if not explicitly trained
Multi-layer perceptron
MLP amplifies noise; bounded activation functions could help; as well as duplicating data before "noising"
18/09/2018
3rd IPM Workshop, D. Vilsmeier
So far:
Machine Learning Model
Instead:
Machine Learning Model
Compute beam RMS
Compute beam profile
18/09/2018
3rd IPM Workshop, D. Vilsmeier
MLP Architecture
tanh activation function
Performance measure
prediction
target
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Gen-Gauss used for testing while training (fitting) was performed with Gaussian bunch shape
Smaller distortion in this case
ANN model generalizes to different beam shapes
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Q-Gauss used for testing while training (fitting) was performed with Gaussian bunch shape
No distortion for this case
ð nothing to correct for; MLP preserves the state
ANN model generalizes to different beam shapes
18/09/2018
3rd IPM Workshop, D. Vilsmeier
inactive
active
MLP shows very small standard deviation in predictions ð the fitting converged well, small model uncertainty
18/09/2018
3rd IPM Workshop, D. Vilsmeier
preliminary
18/09/2018
3rd IPM Workshop, D. Vilsmeier
Icons by icons8.