foundations of data science for everyone

dr.federica bianco | fbb.space |    fedhere |    fedhere

IV: Machine Learning & Linear Regression

what is machine learning?

what is a model?

objective function:

what you want to optimize for

Fit model parameters <==> minimize the objective function

yi: i-th observation

xi: i-th measurement "location"

NOTE: Sum of residual squared (least square fit method)

SAE = \sum{|y_{i,observed} - y_{i,predict} |}

SAE = \sum{|y_{i,observed} - y_{i,predict} |}

SSE = \sum{(y_{i, observed} - y_{i, predicted} )^2}

SSE = \sum{(y_{i, observed} - y_{i, predicted} )^2}

objective function:

what you want to optimize for

SAE = \sum{|y_{i,observed} - y_{i,predict} |}

SAE = \sum{|y_{i,observed} - y_{i,predict} |}

SSE = \sum{(y_{i, observed} - y_{i, predicted} )^2}

SSE = \sum{(y_{i, observed} - y_{i, predicted} )^2}

Same data, same model, different solution!

WHY???

minimize something:

why square?

- So that the errors do not cancel themselves out

- To add more weight to predictions that are worse

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

Ordinary least square

minimize something:

 def sumsqerror(y, yp):
  ''' objective function squared error
  y: vector of observations
  yp: vector of predictions
  return: sum squared difference
  '''
  return ((y - yp) ** 2).sum()
  
minnow = 1e7
for s in np.arange(0, 3, 0.01):
  for i in np.arange(0, 2.5, 0.01):
    prediction = df['population'] * s + i 
    sse = sumsqerror(df.wspeed, prediction)
    if sse < minnow:
      minnow = sse
      slope_manual, inrercept_manual = s, i
 
slope_manual, inrercept_manual

we can minimize manually...

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

Ordinary least square

We are trying to find the "best" line that goes through the data... but "best" is a judgment call

minimize something:

but its a lot easier to use

built in functions

 # seaborn (just plots) 
sns.regplot(df['population'], df['wspeed'])
 
# numpy
slope, intercept = np.polyfit(df['population'], df['wspeed'], 1)
 
# statsmodels formula
smf.ols(formula='wspeed ~ population', data=df)
 
# statsmodels OLS works for any degree polynomial 
polynomial_features = PolynomialFeatures(degree=1)
xp1 = polynomial_features.fit_transform(x)
model = sm.OLS(y[:10], xp1[:10]).fit()

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

We are trying to find the "best" line that goes through the data... but "best" is a judgment call

minimize something:

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

\mathrm{Sum~of~quared~errors:} ~SSE = \sum_{i=0}^N (y_i - y_{i,predicted})^2

but its a lot easier to use

built in functions

 # seaborn (just plots) 
import seaborn as sns
sns.regplot(df['population'], df['wspeed'])
 
# numpy
import numpy as np
slope, intercept = np.polyfit(df['population'], df['wspeed'], 1)
 
# statsmodels formula
import statsmodels.formula.api as smf
smf.ols(formula='wspeed ~ population', data=df)
 
# statsmodels OLS works for any degree polynomial 
from sklearn.preprocessing import PolynomialFeatures
polynomial_features = PolynomialFeatures(degree=1)
xp1 = polynomial_features.fit_transform(x)
model = sm.OLS(y[:10], xp1[:10]).fit()
 
# sklearn
from sklearn.linear_model import LinearRegression
lm = LinearRegression().fit(X_train, y_train, sample_weight=None)

It can be shown that OLS minimization of the sum of the square errors (SSE) is equivalent to calculating the slope and intercept as:

s=\frac{\sum_{i=1}^N(x_i−\bar{X})(y_i- \bar{Y})}{\sum_{i=1}{N}(x_i−\bar{X})^2}\\ \\ b=\bar{Y}−\bar{X}

s=\frac{\sum_{i=1}^N(x_i−\bar{X})(y_i- \bar{Y})}{\sum_{i=1}{N}(x_i−\bar{X})^2}\\ \\ b=\bar{Y}−\bar{X}

Normal equation

the algorithm: ~~Stochastic~~ Gradient Descent

assume a simpler line model y = ax

(b = 0) so we only need to find the "best" parameter a

1. choose initial value for a

2. calculate the SSE

3. take the gradient of the SSE and step in proportion

: the gradient is the slope of a line tangential to a point on a curve

\nabla x

\nabla x

\mathrm{Gradient~descent}\\ l.r. = \eta * \nabla x_{x=a_0}

\mathrm{Gradient~descent}\\ l.r. = \eta * \nabla x_{x=a_0}

the algorithm: ~~Stochastic~~ Gradient Descent

Things to consider:

- local vs. global minima

- initialization: choosing starting spot?

- learning rate: how far to step?

Adaptive learning rate: fast early on, slow later. Very common with Neural Networks

\mathrm{Gradient~descent}\\ l.r. = \eta * \nabla x_{x=a_0}

\mathrm{Gradient~descent}\\ l.r. = \eta * \nabla x_{x=a_0}

foundations of data science for everyone dr.federica bianco | fbb.space | fedhere | fedhere IV: Machine Learning & Linear Regression

Foundations of Data Science for Everyone

By federica bianco

Foundations of Data Science for Everyone

Machine Learning and linear regression

a year ago
536

federica bianco PRO

astro | data science | data for good

foundations of data science for everyone

this slide deck:

what is machine learning?

what is machine learning?

what is machine learning?

General ML points

what is machine learning?

objective function:

objective function:

objective function:

objective function:

objective function:

objective function:

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Okham's razor

Cross validation

Cross validation

Cross validation

Cross validation

Cross validation

key concepts

references

reading

Foundations of Data Science for Everyone

Foundations of Data Science for Everyone

federica bianco PRO

	def sumsqerror(y, yp):
	''' objective function squared error
	y: vector of observations
	yp: vector of predictions
	return: sum squared difference
	'''
	return ((y - yp) ** 2).sum()

	minnow = 1e7
	for s in np.arange(0, 3, 0.01):
	for i in np.arange(0, 2.5, 0.01):
	prediction = df['population'] * s + i
	sse = sumsqerror(df.wspeed, prediction)
	if sse < minnow:
	minnow = sse
	slope_manual, inrercept_manual = s, i

	slope_manual, inrercept_manual

	# seaborn (just plots)
	sns.regplot(df['population'], df['wspeed'])

	# numpy
	slope, intercept = np.polyfit(df['population'], df['wspeed'], 1)

	# statsmodels formula
	smf.ols(formula='wspeed ~ population', data=df)

	# statsmodels OLS works for any degree polynomial
	polynomial_features = PolynomialFeatures(degree=1)
	xp1 = polynomial_features.fit_transform(x)
	model = sm.OLS(y[:10], xp1[:10]).fit()

	# seaborn (just plots)
	import seaborn as sns
	sns.regplot(df['population'], df['wspeed'])

	# numpy
	import numpy as np
	slope, intercept = np.polyfit(df['population'], df['wspeed'], 1)

	# statsmodels formula
	import statsmodels.formula.api as smf
	smf.ols(formula='wspeed ~ population', data=df)

	# statsmodels OLS works for any degree polynomial
	from sklearn.preprocessing import PolynomialFeatures
	polynomial_features = PolynomialFeatures(degree=1)
	xp1 = polynomial_features.fit_transform(x)
	model = sm.OLS(y[:10], xp1[:10]).fit()

	# sklearn
	from sklearn.linear_model import LinearRegression
	lm = LinearRegression().fit(X_train, y_train, sample_weight=None)

foundations of data science for everyone

Foundations of Data Science for Everyone

More from federica bianco