Social and Political Data Science: Introduction

### Knowledge Mining

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

# Introduction

Ackoff, R.L., 1989. From data to wisdom. Journal of applied systems analysis, 16(1), pp.3-9.

## Statistical Modeling: The Two Cultures

### Leo Breiman 2001: Statistical Science

One assumes that the data are generated by a given stochastic data model.
The other uses algorithmic models and treats the data mechanism as unknown.
Data Model
Algorithmic Model
Small data
Complex, big data

## Theory: Data Generation Process

Data are generated in many fashions.   Picture this: independent variable x goes in one side of the box-- we call it nature for now-- and dependent variable y come out from the other side.

## Theory: Data Generation Process

### Data Model

The analysis in this culture starts with assuming a stochastic data model for the inside of the black box. For example, a common data model is that data are generated by independent draws from response variables.

Response Variable= f(Predictor variables, random noise, parameters)

Reading the response variable is a function of a series of predictor/independent variables, plus random noise (normally distributed errors) and other parameters.

## Theory: Data Generation Process

### Data Model

The values of the parameters are estimated from the data and the model then used for information and/or prediction.

## Theory: Data Generation Process

### Algorithmic Modeling

The analysis in this approach considers the inside of the box complex and unknown. Their approach is to find a function f(x)-an algorithm that operates on x to predict the responses y.

The goal is to find algorithm that accurately predicts y.

## Theory: Data Generation Process

### Algorithmic Modeling

Unsupervised Learning

Supervised Learning         vs.

Source: https://www.mathworks.com