Social and Political Data Science: Introduction

Knowledge Mining

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

keras

• The keras package is an R interface to the Keras deep learning library, which is a high-level neural networks API written in Python.
• It provides a convenient and user-friendly way to define and train deep learning models in R using TensorFlow as the backend.
• The package supports various types of neural networks, such as feedforward, convolutional, and recurrent networks.
• It also offers functionalities for pre-processing, data augmentation, and real-time visualization of training progress.

What is keras?

Cohen's Kappa adjusts for the possibility of agreement occurring by chance. It ranges from -1 to 1:

• A Kappa of 1 indicates perfect agreement between the model's predictions and the actual labels.
• A Kappa of 0 signifies that the agreement is no better than what would be expected by chance alone.
• A Kappa of -1 indicates complete disagreement between the predictions and actual labels.

Kappa

Kappa is  one of the metrics to evaluate the model's performance. The Kappa value, along with other metrics like Accuracy, Sensitivity, and Specificity, can help you better understand the strengths and weaknesses of your model.

• It is important to note that Kappa should be interpreted in the context of the specific problem and dataset you are working with.
• A Kappa value closer to 1 is generally considered good, but it depends on the domain and the problem you are trying to solve.
• In some applications, a Kappa value above 0.6 might be considered satisfactory, while in others, a value above 0.8 might be required.

Mean Absolute Error (MAE)

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

Mean Absolute Error is a metric evaluating the performance of regression models. It measures the average of the absolute differences between the predicted values and the actual (ground truth) values.  It is a clear, interpretable value representing the average magnitude of the prediction errors, without considering the direction of the error.

where:

• $$n$$ is the number of observations in the dataset
• $$y_i$$ is the actual value for observation $$i$$
• $$\hat{y}_i$$ is the predicted value for observation i
• $$|y_i - \hat{y}_i|$$ is the absolute difference between the actual and predicted values

Mean Absolute Error (MAE)

• Lower MAE values indicate better model performance, as the average error between the predicted and actual values is smaller. However, it's essential to interpret the MAE value in the context of the specific problem, as the range of acceptable values may vary depending on the domain and the scale of the target variable.
• When comparing models, the one with the lowest MAE is generally considered to have better predictive performance, assuming the same dataset and target variable.
• However, it's a good idea to consider multiple evaluation metrics, such as RMSE (Root Mean Squared Error) and R-squared, to get a comprehensive understanding of a model's performance.