Learning From Data

Business Analytics

Y
X

All U.S. Houses

Price

Size

\mathbb{E}[Y \vert X]

Sample Size

Features

Neighbors

Y
X

Sample Space

SET A

SET B

(Countable)

(Uncountable)

Y
\mathbb{E}[Y \vert X]

All U.S. Houses

Price

Price

Price

\varepsilon
=
+
Y_i = \mathbb{E}[Y \vert X=X_i] + \varepsilon_i

Price of House

Average Price Given Its Size

Error Term

Y_i = \beta_0 + \beta_1\text{Size}_i + \eta_i

Price of House

Slope

Error Term

Y-intercept

\{(Y_i, X_i)\}_{i=1}^n

Observed

f(\theta, X)

Build & Fit

\mathbb{E}[Y\vert X]

Interest

Setup

Learning From Data

(1) Data

(2) Function Space

Parameter Space

(3) Objective Function

(4) Solver

(Build & Fit)

\varepsilon_i
\eta_i
\hat{\eta}_i
Y
X

All U.S. Houses

Price

Size

\mathbb{E}[Y \vert X]
\frac{\frac{1}{n}\sum_{i=1}^n (\hat{Y}_i - \bar{\hat{Y}})^2}{\frac{1}{n}\sum_{i=1}^n (Y_i - \bar{Y})^2}

Data

Model

1
\text{Var}(\bar{X}) = \frac{\text{Var}(X)}{n}
2
\frac{1}{n-1}\sum_{i=1}^n(X_i - \bar{X})^2 \approx \text{Var}(X)
3
\frac{1}{n(n-1)}\sum_{i=1}^n(X_i - \bar{X})^2 \approx \frac{\text{Var}(X)}{n}
f_x

Set of Parameters

\big( \mathcal{R}^d, \mathcal{B}(\mathcal{R}^d), \mathbb{P}_n \circ \mathcal{A}_n^{-1}\big)
\big( \mathcal{R}, \mathcal{B}(\mathcal{R}), \mathbb{P}_n \circ \mathcal{A}_n^{-1} \circ f_x^{-1}\big)

Set of Outcomes

\mathcal{A}_n

Set of Possible Data Sets

\big( \Omega_n, \mathcal{F}_n, \mathbb{P}_n\big)
\text{MSE}

Set of Parameters

\big( \mathcal{R}^d, \mathcal{B}(\mathcal{R}^d), \mathbb{P}_n \circ \mathcal{A}_n^{-1}\big)
\big( \mathcal{R}, \mathcal{B}(\mathcal{R}), \mathbb{P}_n \circ \mathcal{A}_n^{-1} \circ \text{MSE}^{-1}\big)

Set of Outcomes

\mathcal{A}_n

Set of Possible Data Sets

\big( \Omega_n, \mathcal{F}_n, \mathbb{P}_n\big)