Interactions, part 3

PSY 716

The Regression-ANOVA Connection

As we've discussed ad nauseam at this point, ANOVA models can be expressed as regression models using indicator variables:

$$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$

is equivalent to:

$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + ... + \epsilon_{ij}$$

Consider a one-way ANOVA with factor A having 3 levels:

ANOVA notation: $$y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$

Regression formulation:

 

$$y_{ij} = \beta_0 + \beta_1 x_{1ij} + \beta_2 x_{2ij} + \epsilon_{ij}$$

  • Where \( x_{1ij} = 1 \) if observation in level 1, 0 otherwise
  • Where \( x_{2ij} = 1 \) if observation in level 2, 0 otherwise
  • Level 3 is the reference level (when both \( x_1\) and \( x_2 \) are 0)

If we use effects coding (the default for ANOVA), then:

$$\beta_0 = \mu + \alpha_3$$ (reference level)

 

$$\beta_1 = \alpha_1 - \alpha_3$$ (difference between level 1 and reference)

 

$$\beta_2 = \alpha_2 - \alpha_3$$ (difference between level 2 and reference)

This is why regression tests individual coefficients, while ANOVA tests the overall effect.

F-Tests Can Be Applied in Both Frameworks

The overall F-test in ANOVA (testing if \( \alpha_i = 0 \) for all \(i )\) is equivalent to test of \( R^2 \) in regression.

$$H_0: R^2 = 0$$

 

And, equivalently

 

$$H_0: \beta_1 = \beta_2 = ... = \beta_{k-1} = 0$$

Both represent the test of whether the factor has any effect on the response.

Sums of Squares

Three different approaches to calculating sums of squares:

  1. Type I (Sequential): Calculate in the order terms are specified
  2. Type II (Hierarchical): Test each term after all others, except those containing it
  3. Type III (Marginal): Test each term as if it were entered last

Each leads to different hypothesis tests and interpretations!

Type I Sums of Squares

Also called sequential SS:

  • Calculated in the order specified in the model
  • Each term is adjusted only for terms that precede it
  • Changes if you reorder terms in the model
  • Equivalent to comparing nested models sequentially:
    1. \(SS(A|Intercept)\)

    2. \(SS(B|Intercept, A)\)

    3. \(SS(A \times B|Intercept, A, B)\)

Type II Sums of Squares

Also called hierarchical SS:

  • Each term adjusted for all other terms except those containing it
  • Respects marginality principle
  • Doesn't change with reordering
  • Tests main effects adjusted for all other main effects:
    1. \( SS(A|Intercept, B) \)

    2. \( SS(B|Intercept, A) \)

    3. \(SS(A \times B|Intercept, A, B)\)

Type III Sums of Squares

Also called marginal or orthogonal SS:

  • Each term adjusted for all other terms including those containing it
  • Tests each effect as if it were entered last in the model
  • Most commonly reported in statistical software (other than the anova() command in R, curiously)
    1. \( SS(A|Intercept, B, A \times B) \)

    2. \( SS(B|Intercept, A, A \times B) \)

    3. \( SS(A \times B|Intercept, A, B) \)

Regression Model Comparisons

Sums of squares correspond to specific model comparisons:

Type I:

  • Full model vs. model without term and all higher-order terms containing it

Type II:

  • Full model vs. model without term but with all other terms of same or lower order

Type III:

  • Full model vs. model without term but with all other terms (including higher-order)

Coding Schemes

The coding scheme used for categorical variables affects interpretation:

 

  • Treatment/Dummy Coding: One level as reference (regression default)
  • Effect Coding: Sum of effects constrained to zero (ANOVA default)

Different coding schemes align with different SS types, but we aren't going to discuss this much here.

Common Pitfalls and Misconceptions

  1. Interpreting main effects in the presence of interactions
  2. Using Type III SS without understanding what's being tested
  3. Comparing regression coefficients across different coding schemes
  4. Using Type I SS without consideration of term ordering
  5. Assuming all software packages use the same defaults

Summary: The Unified Framework

  • ANOVA and regression are the same model viewed differently
  • Choice of sum of squares should be based on:
    • Experimental design
    • Balance of the data
    • Specific hypotheses of interest
  • Understanding the connection helps select appropriate analysis
  • Different software packages use different defaults
    • R: Type I by default
    • SAS, SPSS: Type III by default

Interactions - part 3

By Veronica Cole

Interactions - part 3

  • 60