Interactions, part 3
PSY 716
The Regression-ANOVA Connection
As we've discussed ad nauseam at this point, ANOVA models can be expressed as regression models using indicator variables:
$$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$
is equivalent to:
$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + ... + \epsilon_{ij}$$
Consider a one-way ANOVA with factor A having 3 levels:
ANOVA notation: $$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$
Regression formulation:
$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + \epsilon_{ij}$$
- Where \( x_{1ij} = 1 \) if observation in level 1, 0 otherwise
- Where \( x_{2ij} = 1 \) if observation in level 2, 0 otherwise
- Level 3 is the reference level (when both \( x_1\) and \( x_2 \) are 0)
If we use effects coding (the default for ANOVA), then:
$$\beta_0 = \mu + \alpha_3$$ (reference level)
$$\beta_1 = \alpha_1 - \alpha_3$$ (difference between level 1 and reference)
$$\beta_2 = \alpha_2 - \alpha_3$$ (difference between level 2 and reference)
This is why regression tests individual coefficients, while ANOVA tests the overall effect.
F-Tests Can Be Applied in Both Frameworks
The overall F-test in ANOVA (testing if \( \alpha_i = 0 \) for all \(i )\) is equivalent to test of \( R^2 \) in regression.
$$H_0: R^2 = 0$$
And, equivalently
$$H_0: \beta_1 = \beta_2 = ... = \beta_{k-1} = 0$$
Both represent the test of whether the factor has any effect on the response.
Sums of Squares
Three different approaches to calculating sums of squares:
- Type I (Sequential): Calculate in the order terms are specified
- Type II (Hierarchical): Test each term after all others, except those containing it
- Type III (Marginal): Test each term as if it were entered last
Each leads to different hypothesis tests and interpretations!
Type I Sums of Squares
Also called sequential SS:
- Calculated in the order specified in the model
- Each term is adjusted only for terms that precede it
- Changes if you reorder terms in the model
- Equivalent to comparing nested models sequentially:
-
\(SS(A|Intercept)\)
-
\(SS(B|Intercept, A)\)
-
\(SS(A \times B|Intercept, A, B)\)
-
Type II Sums of Squares
Also called hierarchical SS:
- Each term adjusted for all other terms except those containing it
- Respects marginality principle
- Doesn't change with reordering
- Tests main effects adjusted for all other main effects:
-
\( SS(A|Intercept, B) \)
-
\( SS(B|Intercept, A) \)
-
\(SS(A \times B|Intercept, A, B)\)
-
Type III Sums of Squares
Also called marginal or orthogonal SS:
- Each term adjusted for all other terms including those containing it
- Tests each effect as if it were entered last in the model
- Most commonly reported in statistical software (other than the anova() command in R, curiously)
-
\( SS(A|Intercept, B, A \times B) \)
-
\( SS(B|Intercept, A, A \times B) \)
-
\( SS(A \times B|Intercept, A, B) \)
-
Regression Model Comparisons
Sums of squares correspond to specific model comparisons:
Type I:
- Full model vs. model without term and all higher-order terms containing it
Type II:
- Full model vs. model without term but with all other terms of same or lower order
Type III:
- Full model vs. model without term but with all other terms (including higher-order)
Coding Schemes
The coding scheme used for categorical variables affects interpretation:
- Treatment/Dummy Coding: One level as reference (regression default)
- Effect Coding: Sum of effects constrained to zero (ANOVA default)
Different coding schemes align with different SS types, but we aren't going to discuss this much here.
Common Pitfalls and Misconceptions
- Interpreting main effects in the presence of interactions
- Using Type III SS without understanding what's being tested
- Comparing regression coefficients across different coding schemes
- Using Type I SS without consideration of term ordering
- Assuming all software packages use the same defaults
Summary: The Unified Framework
- ANOVA and regression are the same model viewed differently
- Choice of sum of squares should be based on:
- Experimental design
- Balance of the data
- Specific hypotheses of interest
- Understanding the connection helps select appropriate analysis
- Different software packages use different defaults
- R: Type I by default
- SAS, SPSS: Type III by default
Interactions - part 3
By Veronica Cole
Interactions - part 3
- 60