Interactions - part 3

Interactions, part 3

PSY 716

The Regression-ANOVA Connection

As we've discussed ad nauseam at this point, ANOVA models can be expressed as regression models using indicator variables:

$$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$

is equivalent to:

$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + ... + \epsilon_{ij}$$

Consider a one-way ANOVA with factor A having 3 levels:

ANOVA notation: $$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$

Regression formulation:

$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + \epsilon_{ij}$$

Where $ x_{1ij} = 1 $ if observation in level 1, 0 otherwise
Where $ x_{2ij} = 1 $ if observation in level 2, 0 otherwise
Level 3 is the reference level (when both $ x_1$ and $ x_2 $ are 0)

If we use effects coding (the default for ANOVA), then:

$$\beta_0 = \mu + \alpha_3$$ (reference level)

$$\beta_1 = \alpha_1 - \alpha_3$$ (difference between level 1 and reference)

$$\beta_2 = \alpha_2 - \alpha_3$$ (difference between level 2 and reference)

This is why regression tests individual coefficients, while ANOVA tests the overall effect.

F-Tests Can Be Applied in Both Frameworks

The overall F-test in ANOVA (testing if $ \alpha_i = 0 $ for all $i )$ is equivalent to test of $ R^2 $ in regression.

$$H_0: R^2 = 0$$

And, equivalently

$$H_0: \beta_1 = \beta_2 = ... = \beta_{k-1} = 0$$

Both represent the test of whether the factor has any effect on the response.

Sums of Squares

Three different approaches to calculating sums of squares:

Type I (Sequential): Calculate in the order terms are specified
Type II (Hierarchical): Test each term after all others, except those containing it
Type III (Marginal): Test each term as if it were entered last

Each leads to different hypothesis tests and interpretations!

Type I Sums of Squares

Also called sequential SS:

Calculated in the order specified in the model
Each term is adjusted only for terms that precede it
Changes if you reorder terms in the model
Equivalent to comparing nested models sequentially:
1. $SS(A|Intercept)$
2. $SS(B|Intercept, A)$
3. $SS(A \times B|Intercept, A, B)$

Type II Sums of Squares

Also called hierarchical SS:

Each term adjusted for all other terms except those containing it
Respects marginality principle
Doesn't change with reordering
Tests main effects adjusted for all other main effects:
1. $ SS(A|Intercept, B) $
2. $ SS(B|Intercept, A) $
3. $SS(A \times B|Intercept, A, B)$

Type III Sums of Squares

Also called marginal or orthogonal SS:

Each term adjusted for all other terms including those containing it
Tests each effect as if it were entered last in the model
Most commonly reported in statistical software (other than the anova() command in R, curiously)
1. $ SS(A|Intercept, B, A \times B) $
2. $ SS(B|Intercept, A, A \times B) $
3. $ SS(A \times B|Intercept, A, B) $

Regression Model Comparisons

Sums of squares correspond to specific model comparisons:

Type I:

Full model vs. model without term and all higher-order terms containing it

Type II:

Full model vs. model without term but with all other terms of same or lower order

Type III:

Full model vs. model without term but with all other terms (including higher-order)

Coding Schemes

The coding scheme used for categorical variables affects interpretation:

Treatment/Dummy Coding: One level as reference (regression default)
Effect Coding: Sum of effects constrained to zero (ANOVA default)

Different coding schemes align with different SS types, but we aren't going to discuss this much here.

Common Pitfalls and Misconceptions

Interpreting main effects in the presence of interactions
Using Type III SS without understanding what's being tested
Comparing regression coefficients across different coding schemes
Using Type I SS without consideration of term ordering
Assuming all software packages use the same defaults

Summary: The Unified Framework

ANOVA and regression are the same model viewed differently
Choice of sum of squares should be based on:
- Experimental design
- Balance of the data
- Specific hypotheses of interest
Understanding the connection helps select appropriate analysis
Different software packages use different defaults
- R: Type I by default
- SAS, SPSS: Type III by default