PSY 716
As we've discussed ad nauseam at this point, ANOVA models can be expressed as regression models using indicator variables:
$$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$
is equivalent to:
$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + ... + \epsilon_{ij}$$
Consider a one-way ANOVA with factor A having 3 levels:
ANOVA notation: $$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$
Regression formulation:
$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + \epsilon_{ij}$$
If we use effects coding (the default for ANOVA), then:
$$\beta_0 = \mu + \alpha_3$$ (reference level)
$$\beta_1 = \alpha_1 - \alpha_3$$ (difference between level 1 and reference)
$$\beta_2 = \alpha_2 - \alpha_3$$ (difference between level 2 and reference)
This is why regression tests individual coefficients, while ANOVA tests the overall effect.
The overall F-test in ANOVA (testing if \( \alpha_i = 0 \) for all \(i )\) is equivalent to test of \( R^2 \) in regression.
$$H_0: R^2 = 0$$
And, equivalently
$$H_0: \beta_1 = \beta_2 = ... = \beta_{k-1} = 0$$
Both represent the test of whether the factor has any effect on the response.
Three different approaches to calculating sums of squares:
Each leads to different hypothesis tests and interpretations!
Also called sequential SS:
\(SS(A|Intercept)\)
\(SS(B|Intercept, A)\)
\(SS(A \times B|Intercept, A, B)\)
Also called hierarchical SS:
\( SS(A|Intercept, B) \)
\( SS(B|Intercept, A) \)
\(SS(A \times B|Intercept, A, B)\)
Also called marginal or orthogonal SS:
\( SS(A|Intercept, B, A \times B) \)
\( SS(B|Intercept, A, A \times B) \)
\( SS(A \times B|Intercept, A, B) \)
Sums of squares correspond to specific model comparisons:
The coding scheme used for categorical variables affects interpretation:
Different coding schemes align with different SS types, but we aren't going to discuss this much here.