Coding data and making inferences in regression

PSY 716

Single-predictor linear regression

where:

$Y_i$ is the dependent variable
$X_i$ is the independent variable (group membership or continuous covariate)
$\beta_0$ is the intercept
$\beta_1$ is the regression coefficient
$\epsilon_i$ is the error term

$$ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i $$

Multivariate linear regression with five predictors

where:

$y_i$ is the dependent variable
$x_{i1} ... x_{i5}$ are independent variables
$\beta_0$ is the intercept
$\beta_j$ is the regression coefficient for variable $j$
$\epsilon_i$ is the error term

$$ y_i = \beta_0 + \beta_1 x_{i1} + $$

$$ \beta_2 x_{i2} + \beta_3 x_{i3} + \beta_4 x_{i4} + \beta_5 x_{i5} + \epsilon_i $$

The linear regression model predicted a significant portion of the variance in our outcome, $R^2 = .709, F(5, 94) = 45.8, p < .001$. The predictor $x_1$ was significantly positively linked to our outcome, $\beta_1 = 0.459, t(94) = 5.12, p < .001$ And so on...

As a reminder:

Relevant df:
- Model/regression df = k, where k is the number of predictors
- Error/residual df = N - k - 1
Relevant test statistics
- F(Model df, error df)
- t(error df)

Multivariate linear regression with four predictors which are dummy codes for a five-level independent variable

where:

$y_i$ is the dependent variable
$g_{i1} ... g_{i4}$ are dummy codes for a 5-level independent variable
$\beta_0$ is the intercept
$\beta_j$ is the regression coefficient for variable $j$
$\epsilon_i$ is the error term

$$ y_i = \beta_0 + \beta_1 g_{i1} + \beta_2 g_{i2} + \beta_3 g_{i3} + \beta_4 g_{i4} $$

What does each of these coefficients mean????

As a reminder:

Relevant df:
- Model/regression df = k, where k is the number of predictors
- Error/residual df = N - k - 1
Relevant test statistics
- F(Model df, error df)
- t(error df)

18.75831 - (-0.1032) = 18.65508

What if we chose a different reference category?

Multivariate linear regression with five predictors, which are centered

where:

$y_i$ is the dependent variable
$\"{x}_{i1} ... \"{x}_{i5}$ are centered independent variables
$\beta_0$ is the intercept
$\beta_j$ is the regression coefficient for variable $j$
$\epsilon_i$ is the error term

$$ y_i = \beta_0 + \beta_1 \"{x}_{i1} + $$

$$ \beta_2 \"{x}_{i2} + \beta_3 \"{x}_{i3} + \beta_4 \"{x}_{i4} + \beta_5 \"{x}_{i5} $$

N.B. Putting dots on top of predictors isn't necessary -- just want to denote them a different way here!

What does each of these coefficients mean????

As a reminder:

Relevant df:
- Model/regression df = k, where k is the number of predictors
- Error/residual df = N - k - 1
Relevant test statistics
- F(Model df, error df)
- t(error df)

Uncentered

Centered

You may be wondering: Why are we doing all of this?

The goal is to build up to arbitrarily complex stuff, like this:

Multivariate linear regression with five predictors, four of which are levels of a categorical predictor, and an interaction between the categorical predictor and the continuous predictor (yikes)

where:

$Y_i$ is the dependent variable
$X_{i1}$ is an independent variable
$X_{i2} ... X_{i5}$ are dummy codes for another independent variable
$\beta_0$ is the intercept
$\beta_j$ is the regression coefficient for the $j^{th}$ term
$\epsilon_i$ is the error term

$$ Y_i = \beta_0 + \beta_1 x_{i1} + $$

$$ \beta_2 x_{i2} + \beta_3 x_{i3} + \beta_4 x_{i4} + \beta_5 x_{i5} + $$

$$ \beta_6 x_{i1}x_{i2} + \beta_7 x_{i1}x_{i3} + \beta_8 x_{i1}x_{i4} + \beta_9 x_{i1}x_{i5} + \epsilon_i $$

Coding data and making inferences in regression

By Veronica Cole

Coding data and making inferences in regression