30506 王大宇
How humans (tried) to forecast eruptions
Events in seismic data before eruptions
The forecast of Mount Pinatubo, 1991
Why current methods are unreliable?
Introduction to INGV - Volcanic Eruption Prediction Contest
My approach to the contest
Data analyzing / preprocessing
There are a lot of indicators to analyze eruptions ( SO2, groundwater levels...)
For the sake of the contest, we will be focusing on seismic data here!
April ~ Early May : Hybrid earthquakes
Early June : Volcano Tremors generated by the movement of magma, accompanied by an increase in seismic energy release. (started alerts and evacuations)
June 7 : Shift from VT to LP, indicating increasing gas pressure
June 12 : Sharp increase of number and size of LP events,
meaning the release of gas and steam
June 15 : Eruption took place
"Our capability to forecast eruptions is still limited, with ~20% of eruptions accurately forecasted. "
Data Analyzing / Preprocessing
time_to_eruption range: 0.07 ~ 567 days
mean : 264 days
Data Analyzing / Preprocessing
Quite an average spread, only decline after ~4e7
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
(of each sensor)
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Data Analyzing / Preprocessing
Ready to train!
Trying out a regression approach
Linear Regression tries to find the best set of coefficients [\(w0,w1,w2....wp\)] that minimizes Sum of Squared Errors(SSE)
Trying out a regression approach
('sensor_1_mean', 59030887170688.34) ('sensor_1_skew', 4048663.024784088) ('sensor_10_skew', 2595411.6050720215) ('sensor_3_skew', 1270106.8636779785) ('sensor_8_skew', 978927.2803401947)
('sensor_7_skew', -165900.9322490692) ('sensor_9_skew', -1518843.438412428) ('sensor_6_skew', -2284296.6836452484) ('sensor_4_skew', -3355988.1515922546) ('sensor_1_sum', -983831722.500612)
skew is generally helpful, while only certain mean and sums are useful
Trying out a regression approach
median, mean (most of them), and sum (most of them) aren't very impactful
Trying out a regression approach
Private: 12002016 (~139 days)
Public: 11483340 (~132 days)
Trying out a regression approach
Light weight gradient boosing machine proposed by Microsoft in 2017.
Often used in Kaggle competitions because of light weight and efficiency.
Trying out a regression approach
Feature importances (Can't plot because too big)
std and kurtosis generally helped alot, while median and mean didn't have much importance
Trying out a regression approach
Private: 7192681 (~83 days)
Public: 7334062 (~85 days)
Things to improve
10, 25, 90 quantile data
Get frequency data with FFT
Things to improve
773 features vs 8 features!
Private : 4999330 (~57 days)
Public : 4989282 (~57 days)
Things to improve
Private : 3438191 (~39 days)
Public : 3641873 (~42 days)
Holds time and frequency data at the same time!
Forecasting volcano eruptions with seismic data only don't seem feasible currently, with 1st place still having an average error of 40 days. But, if we can gather other valuable data (SO2, temperature...etc), we might be able to get more accurate and reliable results to be used in real life.
Power, J. A. (2001, December). Seismicity and Forecasting of the 1991 Eruption of Mount Pinatubo: A Ten-Year Retrospective. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2001AGUFM.U31A..03P/abstract
Volcanic Earthquakes. (n.d.). PNSN. https://pnsn.org/outreach/earthquakesources/volcanic
Williams, R. G. (2021, June 1). A Burp or a Blast? Seismic Signals Reveal the Volcanic Eruption to Come. Quantamagazine. https://www.quantamagazine.org/seismic-data-helps-scientists-forecast-volcanic-explosions-20210601/
Volcano-Tectonic Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_vte.html
Long Period Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_lpe.html
How Do Scientists Forecast Eruptions? (n.d.). Smithsonian Institution National Museum of Natural History Global Volcanism Program. https://volcano.si.edu/faq/index.cfm?question=eruptionforecast
Jesper, D. S. (2021). Introduction to Volcanology, Seismograms and LGBM. Kaggle. https://www.kaggle.com/code/jesperdramsch/introduction-to-volcanology-seismograms-and-lgbm
Alexander, L. (2021). Ingv_catboost_baseline+tsfresh. Kaggle. https://www.kaggle.com/code/carpediemamigo/ingv-catboost-baseline-tsfresh
Jei, F. (2021). [1st Place] Mel Spectrogram + Blended ResNets. Kaggle. https://www.kaggle.com/competitions/predict-volcanic-eruptions-ingv-oe/discussion/211315
ChatGPT, personal communication, February 11, 2023