Space-charge distortion of transverse profiles measured by electron-based Ionization Profile Monitors and correction methods based on machine learning
D. Vilsmeier
R. Singh
M. Sapinski
Courtesy of T. Giacomini (GSI)
Courtesy of P. Forck (GSI)
Side view
(beam moves left → right)
Front view
(beam moves into page)
ionization products
particle mass
momentum transfer obtained from ionization
simplified
✅
❌
Other benefits of measuring ions:
→ Electrons, on the other hand, allow for bunch-by-bunch measurement due to smaller extraction times
Problem: ions/electrons interact with the em-fields of the beam and receive additional momentum and hence displacement
Solution: apply additional magnetic field to confine the motion of ionization products to their gyro-radius
However: magnet needs space & potential corrector magnets & high costs
electrons are quickly removed, i.e. less exposition to beam fields
Space-charge region
Detector region
More than 90% of electrons increase their gyro-radius, up to two orders of magnitude
LHC beam
ExB-drift
Polarization drift
Capturing
"Pure" gyro-motion
p-bunch
Electron motion
The displacement of measured electrons with respect to their initial position can be split into three distinct contributions:
x
z
Ideally a one-dimensional projection the the transverse beam profile is measured, but...
LHC
N-turn B-fields
Without space-charge electrons at the bunch center will perform exactly N turns for specific magnetic field strengths
Due to space-charge interaction only large field strengths are effective though
Parameter | Range | Step size |
---|---|---|
Bunch pop. [1e11] | 1.1 -- 2.1 ppb | 0.1 ppb |
Bunch width (1σ) | 270 -- 370 μm | 5 μm |
Bunch height (1σ) | 360 -- 600 μm | 20 μm |
Bunch length (4σ) | 0.9 -- 1.2 ns | 0.05 ns |
Protons
6.5 TeV
4kV / 85mm
0.2 T
Training
Validation
Testing
Used to fit the model; split size ~ 60%.
Check generalization to unseen data; split size ~ 20%.
Evaluate final model performance; split size ~ 20%.
Consider 21,021 different cases
1
2
3
🠖 Evaluated on grid data and randomly sampled data
Virtual-IPM simulation tool was used to generate the data
Parameter | Range | Step size |
---|---|---|
Bunch pop. [1e11] | 1.1 -- 2.1 ppb | 0.1 ppb |
Bunch width (1σ) | 270 -- 370 μm | 5 μm |
Bunch height (1σ) | 360 -- 600 μm | 20 μm |
Bunch length (4σ) | 0.9 -- 1.2 ns | 0.05 ns |
Protons
6.5 TeV
4kV / 85mm
0.2 T
= 21,021 cases | 5h / case | 12 yrs | Good thing we have computing clusters :-)
(1)
(1) Virtual-IPM is not limited to IPM simulations, it supports a wide range of applications including BIF, gas jet, etc.
Input layer
Weights
Bias
Apply non-linearity, e.g. ReLU, Tanh, Sigmoid
Perceptron
Multi-Layer Perceptron
Inspired by the human brain, many "neurons" linked together
Map non-linearities through non-linear activation functions
IDense = partial(Dense, kernel_initializer=VarianceScaling())
# Create feed-forward network.
model = Sequential()
# Since this is the first hidden layer we also need to specify
# the shape of the input data (49 predictors).
model.add(IDense(200, activation='relu', input_shape=(49,))
model.add(IDense(170, activation='relu'))
model.add(IDense(140, activation='relu'))
model.add(IDense(110, activation='relu'))
# The network's output (beam sigma). This uses linear activation.
model.add(IDense(1))
model.compile(
optimizer=Adam(lr=0.001),
loss='mean_squared_error',
)
model.fit(
x_train, y_train,
batch_size=8, epochs=100, shuffle=True,
validation_data=(x_val, y_val),
)
Fully-connected
feed-forward
network
with ReLU
activation
function
After each epoch compute loss on validation data in order to prevent "overfitting"
Batch learning
🠖 Iterate through training set multiple times (= epochs)
🠖 Weight updates are performed in batches (of training samples)
keras
keras
Tested also other machine learning algorithms:
• Linear regression (LR)
• Kernel ridge regression (KRR)
• Support vector machine (SVR)
Multi-layer perceptron (= ANN)
Very good results on simulation data 🠖 below 1% accuracy
Results are without consideration of noise on profile data
Linear regression model
no noise on training data
similar noise on training data
Linear regression amplifies noise in predictions if not explicitly trained
Multi-layer perceptron
MLP amplifies noise; bounded activation functions could help; as well as duplicating data before "noising"
So far:
Machine Learning Model
Instead:
Machine Learning Model
Compute beam RMS
Compute beam profile
transformation matrix; represents the probability that an electron which was created at position j is collected at position i;
→ depends on beam parameters / beam distribution
Assumption: no electrons are being lost in the process
i
j
i
j
Provided that the moment of detection is random, the gyromotion of electrons gives rise to probabilities of being detected at a specific position.
Applying the limit results in divergence at the "edges" of the motion, so the expression is only valid for the center. For real profiles however we can work with a discretized version of the abovementioned relation.
Gyro-radius increase is non-uniform along the original beam profile
Main task of the ANN is to generate the inverse of the transformation matrix:
Together with the original profile, the reconstructed profile can be obtained
The final multiplication must be part of the ANN in order for the backpropagation to work; i.e. no simple feed-forward architecture
Also attempted "classical" feed-forward architecture, however the results strongly depended on the applied data transformation for the profiles:
The matrix-generator network from the previous slide works with per-sample normalization
D
a
t
a
s
e
t
row-wise,
per sample normalization
column-wise,
per feature normalization
Performance measure
prediction
target
Mean squared error is effectively reduced, profile shape is restored as well
Gen. Gaussian used for testing while training (fitting) was performed with Gaussian bunch shape
Smaller distortion in this case
ANN model generalizes to different beam shapes
Max. MSE examples
Q-Gauss used for testing while training (fitting) was performed with Gaussian bunch shape
These profiles are significantly wider than the ones used during training
ANN model generalizes to different beam shapes
inactive
active
ANN shows very small standard deviation in predictions 🠖 the fitting converged well, small model uncertainty
Other possibility: Snapshot Ensembles
preliminary