t-tests and ANOVA as regressions

PSY 716

"Your model is a special case of mine" is the greatest insult in all of mathematics.

-unknown

Case 1: Independent samples t. tests, ANOVA's, and ANCOVA's

where:

$Y_i$ is the dependent variable
$X_i$ is the independent variable (group membership or continuous covariate)
$\beta_0$ is the intercept
$\beta_1$ is the regression coefficient
$\epsilon_i$ is the error term

$$ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i $$

Case 1: Independent samples t. tests, ANOVA's, and ANCOVA's

where:

$Y_i$ is the dependent variable
$X_{i1}$ is an independent variable
$X_{i2} ... X_{i5}$ are dummy codes for another independent variable
$\beta_0$ is the intercept
$\beta_1$ is the regression coefficient
$\epsilon_i$ is the error term

$$ Y_i = \beta_0 + \beta_1 x_{i1} + $$

$$ \beta_2 x_{i2} + \beta_3 x_{i3} + \beta_4 x_{i4} + \beta_5 x_{i5} + $$

$$ \beta_6 x_{i1}x_{i2} + \beta_7 x_{i1}x_{i3} + \beta_8 x_{i1}x_{i4} + \beta_9 x_{i1}x_{i5} + \epsilon_i $$

Why consider these as linear models?

Unified framework for understanding the techniques
- There's something kind of cool about this, right?
Much greater flexibility
- Continuous predictors of all types
- Different types of relationships among variables
  - particularly in the case of MANOVA
Ability to handle unbalanced designs and missing data
- Consider repeated-measures ANOVA -- what if someone misses one condition?

Why not consider these as linear models?

The non-regression ways of applying these techniques often have built-in adjustments for violations of assumptions

Example: Welch's t-test
- Adjusts for unequal variances (heteroscedasticity) between groups
- Linear regression formulation does not inherently account for this
Other examples include corrections for sphericity in repeated-measures ANOVA

Reason 1: Special Adjustments for Violations of Assumptions

Why not consider these as linear models?

$$ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$

where:

$\bar{X}_1$ and $\bar{X}_2$ are the sample means
$s_1^2$ and $s_2^2$ are the sample variances
$n_1$ and $n_2$ are the sample sizes of each group

Reason 1: Special Adjustments for Violations of Assumptions

Why not consider these as linear models?

Traditional techniques like ANOVA and MANOVA automatically adjust for multiple comparisons
- Example: ANOVA F-test compares all groups simultaneously
Linear regression formulation requires multiple dummy codes
- Each dummy code represents a separate comparison
This increases the risk of Type I error (false positives) if not adjusted
Of course, you could adjust it! But corrections like Bonferroni or Tukey's HSD are more intuitive in traditional techniques

Reason 2: Intuitive Adjustment for Multiple Comparisons

Why not consider these as linear models?

There are some quantities that ANOVA-family analyses will give you automatically. Why struggle to get a linear regression to give you those?
- Omnibus effects in ANOVA
- Group means and differences between them
- Sums of squares and resultant effect sizes (e.g., partial $\eta^2$)
- Question: what else?

Reason 3: Sometimes it just doesn't make sense for the research question you are answering.

Why not consider these as linear models?

Huge number of researcher degrees of freedom entailed in the regression-based approach.
- What should your reference group be?
- What do you do if only one of your dummy codes is significant?
- What do you do if you're basically agnostic to the nature of multivariate differences between groups?
  - Maybe we should do MANOVA before multigroup SEM

Reason 3: Sometimes it just doesn't make sense for the research question you are answering.

Recommendations

Running a regression and then an ANOVA on top of it is sometimes a good strategy
- ...particularly in the case in which the model is more naturally parameterized as a regression, such as a model with continuous covariates
Be conservative
- ...which often means sticking to methods that naturally adjust for multiple comparisons.
  - (i.e., ANOVA's rather than a dummy-coded regression)
Most of all, be wary of results that hold up across one strategy but not the other.

Copy of deck

By Veronica Cole

t-tests and ANOVA as regressions

"Your model is a special case of mine" is the greatest insult in all of mathematics.

Case 1: Independent samples t. tests, ANOVA's, and ANCOVA's

Case 1: Independent samples t. tests, ANOVA's, and ANCOVA's

Why consider these as linear models?

Why not consider these as linear models?

Why not consider these as linear models?

Why not consider these as linear models?

Why not consider these as linear models?

Why not consider these as linear models?

Recommendations

Copy of deck

More from Veronica Cole