Using Seismic Data To Forecast Volcano Eruptions With Machine Learning

30506 王大宇

Table Of Contents

  • How humans (tried) to forecast eruptions

    • Events in seismic data before eruptions

    • The forecast of Mount Pinatubo, 1991

    • Why current methods are unreliable?

  • Introduction to INGV - Volcanic Eruption Prediction Contest

  • My approach to the contest

    • Data analyzing / preprocessing

    • Trying out a regression approach
    • Things to improve
  • Summary
  • References

How humans (tried) to forecast eruptions

There are a lot of indicators to analyze eruptions ( SO2, groundwater levels...)

For the sake of the contest, we will be focusing on seismic data here!

Events in seismic data before eruptions

Events in seismic data before eruptions

Volcano-tectonic earthquakes (VT)

  • Caused by the movement of magma and fluids within the Earth's crust
  • Weeks to months before the eruption
  • Usually have frequencies in the range of 1 - 30 Hz                                                             

Low-frequency earthquakes (LFEs)

  • Long earthquakes caused by fluid and earth crust movement 
  • Hours to days before the eruption
  • Usually have frequencies below 1 Hz                                                             

Events in seismic data before eruptions

Long-Period earthquakes (LP)

  • High-frequency earthquakes caused by the sudden release of gas and magma
  • Days to weeks before the eruption
  • Usually have frequencies below 1 Hz          

                                                       

Hybrid earthquakes (HYB)

  • Has the characteristics of VT and LFEs
  • Indicates the ascent of magma
  • Usually have frequencies around 1-10 Hz     
  • Can occur any time, usually accompanied
    with other types of events   

                                                       

The forecast of Mount Pinatubo, 1991

April ~ Early May : Hybrid earthquakes

Early June : Volcano Tremors generated by the movement of magma, accompanied by an increase in seismic energy release. (started alerts and evacuations)

June 7 : Shift from VT to LP, indicating increasing gas pressure

June 12 : Sharp increase of number and size of LP events, 

meaning the release of gas and steam

June 15 : Eruption took place

Why current methods are unreliable?

  • Hard to generalize to different volcanos
  • Current methods have short warning times
  • Most volcanos are NOT LINEAR
    -> doesn't behave like what we thought it should be ( ex: decrease of earthquake movements )

"Our capability to forecast eruptions is still limited, with ~20% of eruptions accurately forecasted. "

Introduction to INGV - Volcanic Eruption Prediction Contest

Goal: Predict time_till_eruption for a volcano with ten 10 minutes of seismic data

Scoring method:

Mean Average Error (MAE)

 

\( \frac{\sum_{i=1}^{n}|y_i-f(x_i)|}{n}\)

My approach to the contest

Data Analyzing / Preprocessing

time_to_eruption range: 0.07 ~ 567 days

mean : 264 days

Data Analyzing / Preprocessing

Quite an average spread, only decline after ~4e7

Data Analyzing / Preprocessing

Data Analyzing / Preprocessing

Idea: aggregate the data and play with that to prevent dealing with time series!

Time series data is hard to deal with :( 

Data Analyzing / Preprocessing

Selected aggregated features:

  1. sum 
  2. min
  3. max
  4. mean 
  5. standard deviation (std)
  6. median
  7. skewness (skew)
    -> how asymmetrical a set of data is
  8. kurtosis
    -> checks for the "peakness" of the data

(of each sensor)

Data Analyzing / Preprocessing

Data Analyzing / Preprocessing

Data Analyzing / Preprocessing

Ready to train!

Trying out a regression approach

Linear Regression 

\(f(x) = w_0 + w_1x_1 + w_2x_2....+w_px_p\)

Linear Regression tries to find the best set of coefficients [\(w0,w1,w2....wp\)] that minimizes Sum of Squared Errors(SSE)

Trying out a regression approach

Linear Regression 

('sensor_1_mean', 59030887170688.34)
('sensor_1_skew', 4048663.024784088)
('sensor_10_skew', 2595411.6050720215)
('sensor_3_skew', 1270106.8636779785)
('sensor_8_skew', 978927.2803401947)
('sensor_7_skew', -165900.9322490692)
('sensor_9_skew', -1518843.438412428)
('sensor_6_skew', -2284296.6836452484)
('sensor_4_skew', -3355988.1515922546)
('sensor_1_sum', -983831722.500612)

Trained Coefficients

skew is generally helpful, while only certain mean and sums are useful

Trying out a regression approach

Linear Regression 

Trained Coefficients

median, mean (most of them), and sum (most of them) aren't very impactful

Trying out a regression approach

Linear Regression 

Private: 12002016 (~139 days)

Public: 11483340 (~132 days)

Trying out a regression approach

LightGBM Regressor 

Light weight gradient boosing machine proposed by Microsoft in 2017.

Often used in Kaggle competitions because of light weight and efficiency.

Trying out a regression approach

LightGBM Regressor 

Feature importances (Can't plot because too big)

std and kurtosis generally helped alot, while median and mean didn't have much importance 

Trying out a regression approach

LightGBM Regressor 

Private: 7192681 (~83 days)
Public: 7334062 (~85 days)

Things to improve

More Features

  • 10, 25, 90 quantile data

  • Get frequency data with FFT

Different Models

  • XGBoost

  • CatBoost 

Model Ensembling

Things to improve

773 features vs 8 features!

Private : 4999330 (~57 days)

Public : 4989282 (~57 days)

Things to improve

Private : 3438191 (~39 days)

Public : 3641873 (~42 days)

 

Holds time and frequency data at the same time!

Summary

Forecasting volcano eruptions with seismic data only don't seem feasible currently, with 1st place still having an average error of 40 days. But, if we can gather other valuable data (SO2, temperature...etc), we might be able to get more accurate and reliable results to be used in real life.

References

Power, J. A. (2001, December). Seismicity and Forecasting of the 1991 Eruption of Mount Pinatubo: A Ten-Year Retrospective. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2001AGUFM.U31A..03P/abstract

Volcanic Earthquakes. (n.d.). PNSN. https://pnsn.org/outreach/earthquakesources/volcanic

Williams, R. G. (2021, June 1). A Burp or a Blast? Seismic Signals Reveal the Volcanic Eruption to Come. Quantamagazine. https://www.quantamagazine.org/seismic-data-helps-scientists-forecast-volcanic-explosions-20210601/

Volcano-Tectonic Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_vte.html

 

References

Long Period Event. (2002, October). NOVA. https://www.pbs.org/wgbh/nova/volcano/seis_lpe.html

How Do Scientists Forecast Eruptions? (n.d.). Smithsonian Institution National Museum of Natural History Global Volcanism Program. https://volcano.si.edu/faq/index.cfm?question=eruptionforecast

Jesper, D. S. (2021). Introduction to Volcanology, Seismograms and LGBM. Kaggle. https://www.kaggle.com/code/jesperdramsch/introduction-to-volcanology-seismograms-and-lgbm

Alexander, L. (2021). Ingv_catboost_baseline+tsfresh. Kaggle. https://www.kaggle.com/code/carpediemamigo/ingv-catboost-baseline-tsfresh

Jei, F. (2021). [1st Place] Mel Spectrogram + Blended ResNets. Kaggle. https://www.kaggle.com/competitions/predict-volcanic-eruptions-ingv-oe/discussion/211315

ChatGPT, personal communication, February 11, 2023

 

地科

By yeedrag

地科

  • 245