Learning From Data

Business Analytics

Y
X

All U.S. Houses

Price

Size

\mathbb{E}[Y \vert X]

Sample Size

Features

Neighbors

Y
X

Sample Space

SET A

SET B

(Countable)

(Uncountable)

Y
\mathbb{E}[Y \vert X]

All U.S. Houses

Price

Price

Price

\varepsilon
=
+
Y_i = \mathbb{E}[Y \vert X=X_i] + \varepsilon_i

Price of House

Average Price Given Its Size

Error Term

Y_i = \beta_0 + \beta_1\text{Size}_i + \eta_i

Price of House

Slope

Error Term

Y-intercept

\{(Y_i, X_i)\}_{i=1}^n

Observed

f(\theta, X)

Build & Fit

\mathbb{E}[Y\vert X]

Interest

Setup

Learning From Data

(1) Data

(2) Function Space

Parameter Space

(3) Objective Function

(4) Solver

(Build & Fit)

\varepsilon_i
\eta_i
\hat{\eta}_i
Y
X

All U.S. Houses

Price

Size

\mathbb{E}[Y \vert X]
Y
X

All U.S. Houses

Price

Size

\mathbb{E}[Y \vert X]
\frac{\frac{1}{n}\sum_{i=1}^n (\hat{Y}_i - \bar{\hat{Y}})^2}{\frac{1}{n}\sum_{i=1}^n (Y_i - \bar{Y})^2}

Data

Model

1
\text{Var}(\bar{X}) = \frac{\text{Var}(X)}{n}
2
\frac{1}{n-1}\sum_{i=1}^n(X_i - \bar{X})^2 \approx \text{Var}(X)
3
\frac{1}{n(n-1)}\sum_{i=1}^n(X_i - \bar{X})^2 \approx \frac{\text{Var}(X)}{n}
f_x

Set of Parameters

\big( \mathcal{R}^d, \mathcal{B}(\mathcal{R}^d), \mathbb{P}_n \circ \mathcal{A}_n^{-1}\big)
\big( \mathcal{R}, \mathcal{B}(\mathcal{R}), \mathbb{P}_n \circ \mathcal{A}_n^{-1} \circ f_x^{-1}\big)

Set of Outcomes

\mathcal{A}_n

Set of Possible Data Sets

\big( \Omega_n, \mathcal{F}_n, \mathbb{P}_n\big)
f_x

Set of Parameters

\big( \mathcal{R}^d, \mathcal{B}(\mathcal{R}^d), \mathbb{P}_n \circ \mathcal{A}_n^{-1}\big)
\big( \mathcal{R}, \mathcal{B}(\mathcal{R}), \mathbb{P}_n \circ \mathcal{A}_n^{-1} \circ f_x^{-1}\big)

Set of Outcomes

\mathcal{A}_n

Set of Possible Data Sets

\big( \Omega_n, \mathcal{F}_n, \mathbb{P}_n\big)
\mathbb{E}[(Y- f(\hat{\beta}, X_{:k})^2)]

Set of Parameters

\hat{\beta}

Set of Possible Data Sets

\mathcal{R}
\underset{f, \beta, k}{\text{argmin}} \ \mathbb{E}_{\mathbb{P}_n}\Big[ \mathbb{E}_{\mathbb{P}}\big[ \big(Y - f(\beta, X_{:k}) \big)^2\big] \Big]
\text{MSE}

Set of Parameters

\big( \mathcal{R}^d, \mathcal{B}(\mathcal{R}^d), \mathbb{P}_n \circ \mathcal{A}_n^{-1}\big)
\big( \mathcal{R}, \mathcal{B}(\mathcal{R}), \mathbb{P}_n \circ \mathcal{A}_n^{-1} \circ \text{MSE}^{-1}\big)

Set of Outcomes

\mathcal{A}_n

Set of Possible Data Sets

\big( \Omega_n, \mathcal{F}_n, \mathbb{P}_n\big)

Set of All Linear Models With Price As the Dependent Variable

\text{Price}_i = \beta_0 + \beta_1\text{Size} + \varepsilon_i

Best Parameter Values

Set of All Linear Models With Price As the Dependent Variable

\text{Price}_i = \beta_0 + \beta_1\text{Size} + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Beacon}_i + \beta_2 \text{Rooms} + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Beacon}_i + \beta_2 \text{Rooms} + \beta_3 + \text{Size} + \varepsilon_i

Best Parameter Values

Set of All Single Variable Linear Models With Price As the Dependent Variable

\text{Price}_i = \beta_0 + \beta_1\text{Size} + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Beacon}_i + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Garage}_i + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{BuildingStyle} + \varepsilon_i

Best Parameter Values

Set of All Multiple Linear Regression Models with Two Independent Variables Where Price is the Dependent Variable and Size is the First Independent Variable 

\text{Price}_i = \beta_0 + \beta_1\text{Size}_i +\beta_1\text{BaseFloor}_i + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Size}_i + \beta_2\text{Beacon}_i + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Size}_i + \beta_2\text{Garage}_i + \varepsilon_i

Best Parameter Values

\text{Price}_i = \beta_0 + \beta_1\text{Size}_i + \beta_2\text{BuildingStyle}_i + \varepsilon_i

Best Parameter Values

Original Data Set

Train

Data Set

Test

Data Set

For a given linear equation, find the parameters with the lowest MSE on the Training Data Set

Evaluate the preformance of the estimated parameters on the test set

70%

30%

From Sampling

Model Misspecification

Inherent Noise

Sources of Prediction Error

Y_i - \mathbb{E}[Y_i \vert X=X_i]
\mathbb{E}[Y_i \vert X=X_i] - (\beta_0 + \beta_1X_i)
(\beta_0 + \beta_1X_i) - (\hat{\beta}_0 + \hat{\beta}_1X_i)

Set of Functions linear in X

Model Misspecification Error

Conditional Expectation Function

Set of Functions of X

Set of Functions linear in X

Model Misspecification Error

Conditional Expectation Function

Set of Functions of X

Set of Functions Linear in Parameters

Data Set

Y
X_1
X_2
X_3
X_4
X_5

Business Analytics - Learning From Data

By Patrick Power

Business Analytics - Learning From Data

  • 65