Using Seismic Data To Forecast Volcano Eruptions With Machine Learning
30506 王大宇
Table Of Contents
-
How humans (tried) to forecast eruptions
-
Events in seismic data before eruptions
-
The forecast of Mount Pinatubo, 1991
-
Why current methods are unreliable?
-
-
Introduction to INGV - Volcanic Eruption Prediction Contest
-
My approach to the contest
-
Data analyzing / preprocessing
- Trying out a regression approach
- Things to improve
-
- Summary
- References
How humans (tried) to forecast eruptions
There are a lot of indicators to analyze eruptions ( SO2, groundwater levels...)
For the sake of the contest, we will be focusing on seismic data here!
Events in seismic data before eruptions
Events in seismic data before eruptions
Volcano-tectonic earthquakes (VT)
- Caused by the movement of magma and fluids within the Earth's crust
- Weeks to months before the eruption
- Usually have frequencies in the range of 1 - 30 Hz
Low-frequency earthquakes (LFEs)
- Long earthquakes caused by fluid and earth crust movement
- Hours to days before the eruption
- Usually have frequencies below 1 Hz
Events in seismic data before eruptions
Long-Period earthquakes (LP)
- High-frequency earthquakes caused by the sudden release of gas and magma
- Days to weeks before the eruption
- Usually have frequencies below 1 Hz
Hybrid earthquakes (HYB)
- Has the characteristics of VT and LFEs
- Indicates the ascent of magma
- Usually have frequencies around 1-10 Hz
- Can occur any time, usually accompanied
with other types of events
The forecast of Mount Pinatubo, 1991
April ~ Early May : Hybrid earthquakes
Early June : Volcano Tremors generated by the movement of magma, accompanied by an increase in seismic energy release. (started alerts and evacuations)
June 7 : Shift from VT to LP, indicating increasing gas pressure
June 12 : Sharp increase of number and size of LP events,
meaning the release of gas and steam
June 15 : Eruption took place
Why current methods are unreliable?
- Hard to generalize to different volcanos
- Current methods have short warning times
- Most volcanos are NOT LINEAR
-> doesn't behave like what we thought it should be ( ex: decrease of earthquake movements )
"Our capability to forecast eruptions is still limited, with ~20% of eruptions accurately forecasted. "
Introduction to INGV - Volcanic Eruption Prediction Contest
Goal: Predict time_till_eruption for a volcano with ten 10 minutes of seismic data
Scoring method:
Mean Average Error (MAE)
\( \frac{\sum_{i=1}^{n}|y_i-f(x_i)|}{n}\)
My approach to the contest
Data Analyzing / Preprocessing
time_to_eruption range: 0.07 ~ 567 days
mean : 264 days
Data Analyzing / Preprocessing
Quite an average spread, only decline after ~4e7
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Idea: aggregate the data and play with that to prevent dealing with time series!
Time series data is hard to deal with :(
Data Analyzing / Preprocessing
Selected aggregated features:
- sum
- min
- max
- mean
- standard deviation (std)
- median
- skewness (skew)
-> how asymmetrical a set of data is - kurtosis
-> checks for the "peakness" of the data
(of each sensor)
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Ready to train!
Trying out a regression approach
Linear Regression
\(f(x) = w_0 + w_1x_1 + w_2x_2....+w_px_p\)
Linear Regression tries to find the best set of coefficients [\(w0,w1,w2....wp\)] that minimizes Sum of Squared Errors(SSE)
Trying out a regression approach
Linear Regression
('sensor_1_mean', 59030887170688.34) ('sensor_1_skew', 4048663.024784088) ('sensor_10_skew', 2595411.6050720215) ('sensor_3_skew', 1270106.8636779785) ('sensor_8_skew', 978927.2803401947)
('sensor_7_skew', -165900.9322490692) ('sensor_9_skew', -1518843.438412428) ('sensor_6_skew', -2284296.6836452484) ('sensor_4_skew', -3355988.1515922546) ('sensor_1_sum', -983831722.500612)
Trained Coefficients
skew is generally helpful, while only certain mean and sums are useful
Trying out a regression approach
Linear Regression
Trained Coefficients
median, mean (most of them), and sum (most of them) aren't very impactful
Trying out a regression approach
Linear Regression
Private: 12002016 (~139 days)
Public: 11483340 (~132 days)
Trying out a regression approach
LightGBM Regressor
Light weight gradient boosing machine proposed by Microsoft in 2017.
Often used in Kaggle competitions because of light weight and efficiency.
Trying out a regression approach
LightGBM Regressor
Feature importances (Can't plot because too big)
std and kurtosis generally helped alot, while median and mean didn't have much importance
Trying out a regression approach
LightGBM Regressor
Private: 7192681 (~83 days)
Public: 7334062 (~85 days)
Things to improve
More Features
-
10, 25, 90 quantile data
-
Get frequency data with FFT
Different Models
-
XGBoost
-
CatBoost
Model Ensembling
Things to improve
773 features vs 8 features!
Private : 4999330 (~57 days)
Public : 4989282 (~57 days)
Things to improve
Private : 3438191 (~39 days)
Public : 3641873 (~42 days)
Holds time and frequency data at the same time!
Summary
Forecasting volcano eruptions with seismic data only don't seem feasible currently, with 1st place still having an average error of 40 days. But, if we can gather other valuable data (SO2, temperature...etc), we might be able to get more accurate and reliable results to be used in real life.
References
Power, J. A. (2001, December). Seismicity and Forecasting of the 1991 Eruption of Mount Pinatubo: A Ten-Year Retrospective. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2001AGUFM.U31A..03P/abstract
Volcanic Earthquakes. (n.d.). PNSN. https://pnsn.org/outreach/earthquakesources/volcanic
Williams, R. G. (2021, June 1). A Burp or a Blast? Seismic Signals Reveal the Volcanic Eruption to Come. Quantamagazine. https://www.quantamagazine.org/seismic-data-helps-scientists-forecast-volcanic-explosions-20210601/
Volcano-Tectonic Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_vte.html
References
Long Period Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_lpe.html
How Do Scientists Forecast Eruptions? (n.d.). Smithsonian Institution National Museum of Natural History Global Volcanism Program. https://volcano.si.edu/faq/index.cfm?question=eruptionforecast
Jesper, D. S. (2021). Introduction to Volcanology, Seismograms and LGBM. Kaggle. https://www.kaggle.com/code/jesperdramsch/introduction-to-volcanology-seismograms-and-lgbm
Alexander, L. (2021). Ingv_catboost_baseline+tsfresh. Kaggle. https://www.kaggle.com/code/carpediemamigo/ingv-catboost-baseline-tsfresh
Jei, F. (2021). [1st Place] Mel Spectrogram + Blended ResNets. Kaggle. https://www.kaggle.com/competitions/predict-volcanic-eruptions-ingv-oe/discussion/211315
ChatGPT, personal communication, February 11, 2023
地科
By yeedrag
地科
- 230