Presented by: Bsc. Sebastian David Ariza Coll
Advisor: Elias David Niño Ruiz, PhD.
Hurricane Katrina wind speed, forecasts made by NOAA
Statistical modeling techniques
To implement efficient formulation of data-driven models to make weather forecast and improve this one via sequential data assimilation.
Source: (Geron, 2019; Goodfellow et al.2016), https://developer.ibm.com/articles/cc-models-machine-learning/
Prediction models have been developed using Autoencoders based on Convolutional Neural Networks (CNN) architectures. This type of Machine Learning is based on unsupervised training.
ANN Architecture
The term CNNs suggests that the network employs a mathematical operation called Convolution. Convolution is a specialized form of a linear operation. Convolutional networks are essentially neural networks that utilize convolution instead of general matrix multiplication in at least one of their layers (Geron, 2019; Goodfellow et al., 2016).
Source: (Geron, 2019)
Source: (Geron, 2019)
Two fundamental pooling operations in neural networks are max-pooling and average-pooling. The choice between them depends on the specific problem under consideration. In the context of vision-related problems, max-pooling is often the preferred option. These operations could be seen as follows:
Graphs of main activation’s functions
Autoencoders are neural networks aiming to learn concise data representations. They comprise an encoder mapping input to a hidden representation, and a decoder reconstructing the output. Training aims to minimize reconstruction error, often with regularization. They excel in dimensionality reduction, feature learning, and now in generative modeling and unsupervised tasks (Geron, 2019; Goodfellow et al., 2016; Ng et al., 2011).
Convolucional autoencoder
ANN Autoencoder
Model component
Observation space
Analysis
Data assimilation (DA) is a methodological approach aiming to enhance model predictions by integrating observational data. It addresses the need for accurate forecasts, crucially dependent on robust models. Without periodic calibration against real-world observations, models can degrade, diminishing their utility. Thus, optimizing model states to match observations is vital before analysis or prediction, a common scenario in inverse problems. DA essentially approximates a physical system's true state by merging distributed observations with a dynamic model (Asch et al., 2016; Lahoz et al., 2010; McLaughlin, 2014; Vetra-Carvalho et al., 2018)
In Statistical Data Assimilation, the observations are assimilated into the model forecast (background) by inference (Hannart et al., 2016; Nearing et al., 2018)
quantifies the distribution in observations error. is the maximum Likelihood.
Where:
is the previous knowledge about system state
is the normalization constant
is the update estimation of the true state
Bayesian inference - One dimension case
Based on Bayes theorem, it is completed corrected to say that
Which means that model state updates are made as soon as observations are available and then the estimate is propagated. Recall the Gaussian assumption given, the inverse problem resolved to compute the analysis is the following optimization problem:
is the background covariance matrix, and in general be time independent.
is independent.
is the observations covariance matrix, and in general be time independent.
is independent.
It is possible to say that:
Note that
What means that the optimization problem can be written as
The optimal value can be obtained with the stationary point
Solving the optimization problem for the quadratic function also known as 3D-Var function, it is equivalent to solve the optimal interpolation problem in 1-dimension. Also, this DA method could be seen as Two-Step algorithm in which
Forecast based on observations
Based on (Golub and Van Loan, 2013; Woodbury, 1950), who affirm The Woodbury identity for matrices
It is possible to rewrite the equation
Generating 2 more ways of Kalman Filter that are completely equivalent to the equation mentioned as follows.
And
According to Nino-Ruiz et al., (2017a) within the EnKF framework, an ensemble comprising N model realizations.
is utilized in order to estimate, the moments of the background error distribution
Via empirical moments of the ensemble, where the its empirical mean is
And
When an observation is available, the assimilation process can be performed as follows:
Where is known as the Kalman Gain matrix (Evensen, 2003), is the Innovation Matrix (Bishop et al., 2001).
And is the matrix innovations on the synthetic observations which reads:
Note, columns of are samples from a zero-mean Normal distribution with data-error covariance matrix
Alternative formulation based on two more ways of Kalman Filter are:
And
The analysis covariance matrix A reads,
The observations have the form:
Where
is the real observation and every other are perturbed observations.
Note that efficient way to calculate those updates is given by (Nino-Ruiz et al., 2017a) as follows:
Where could be obtained by the solution of the linear system of equations,
According to (Nino-Ruiz, 2021; Nino-Ruiz, Sandu, and Deng, 2018) in the context of covariance matrix localization, a decorrelation matrix is typically used in order to dissipate spurious correlations in spatially distant model components,
Where denotes the Schur product, is a localized covariance matrix, and the components of the localization matrix , for instance, reads
The achievement of the mentioned objectives will be reflected in the construction of the following framework. It is worth mentioning that when creating Deep Learning models , these must undergo a training process, during which the parameters with which they are adjusted impact their learning and therefore their training.
Scheme of Weather Forecasting based on First order Markovian model
NetCDF Structure
Weather variables include Air Temperature, U-Wind Component, and V-Wind Component, analyzed at the 1000hPa pressure level from 01-01-2020 00:00 to 31-12-2020 18:00. A global 2.5-degree latitude by 2.5-degree longitude grid (144x73) spanning 90N to 90S and 0E to 357.5E is used.
Air Temperature
U-Wind
V-Wind
The architecture of the Data-Driven model is based on a convolutional autoencoder, as proposed by (Weyn et al., 2020), with some modifications, such as the activation function and cropping layer. This is necessary because the default framework used increases dimensions for even numbers. Additionally, a generic architecture can be observed for the trained transition models as follows:
Data - Driven Architecture
During the training phase, it was considered training one model M for each study
variable at the aforementioned pressure level. This was done with the aim of generating the following transition scheme based on the predictions provided by each model
is an open source hyperparameter optimization framework to automate
hyperparameter search (Akiba et al., 2019).
As mentioned before, the assimilation process, for instance, can be stochastically performed by:
Decorrelation matrix can be used to reduce the effects of sampling erros presented in the ensemble, recalling equation:
Therefore, it is possible to formulate the new way to perform the assimilation process as follows:
In this context, the assimilation process involves utilizing data from Reanalysis 2, wich will be presented in a 2-dimentional format. Consequently, the distance function is defined as the Euclidean distance:
The testing period spans from November 1, 2023, to November 15, 2023, encompassing both the date of the initial ensemble creation and the subsequent trials
Note that the number of model components is n = 73 · 144 = 10512 and the maximum number of the ensemble members is N = 100 which means that N ≪ n, and the model space is approximately 106 times larger than the number of ensemble members.
Initial ensemble is has the following structure
The precision of each analysis performed at time k is measured by the Mean Absolute Error and the Root Mean Squared Error
Air
Temperature
V-Wind
Component
U-Wind
Component
This master's thesis focuses on enhancing weather forecasting by implementing a model based on convolutional neural networks (CNNs) augmented with a data assimilation scheme.
The utilization of the Optuna framework has been instrumental in optimizing the hyperparameters of the transition model, thereby expanding the solution space effectively without imposing unnecessary constraints.
Our approach has demonstrated effectiveness in improving forecast accuracy, as evidenced by error metrics such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).
Incorporating the decorrelation matrix into the formulation of the Local Ensemble Kalman Filter (LEnKF) has led to the reduction of spurious correlations, resulting in a more stable filter.
The stability of the filter has been validated through error evaluation metrics, thereby consolidating the effectiveness of the proposed methodology.
Exploration of additional data assimilation techniques is recommended, such as the application of the modified Cholesky method for estimating the inverse of the covariance matrix.
Consideration is suggested for other data-driven modeling techniques like Graph Neural Networks (GNNs) and Transformers, which could provide valuable insights into complex relationships in climatic patterns and enhance the modeling of long-term dependencies and scalability, respectively.