Statistics studies "real" differences and relationships

📈 Sample Data & Inference

  • Use of a random sample (e.g., heights in Ireland) to infer population trends
  • Core Question: Are observed differences real or due to chance?
  • Key Quote: “We can’t measure the height and weight of the entire population...”

🔢 Types of Variables

  • Categorical: Divides data into groups (gender, age group)
  • Numeric: Represents measurable quantities (height, weight)
  • Guides choice of appropriate statistical methods

📊 Summarizing & Visualizing Data

  • Transforms raw data into meaningful insights
  • Categorical Data: Counts, bar charts
  • Numeric Data: Range, median, mean, box plots, histograms

🔗 Analyzing Combinations of Variables

  • Categorical + Numeric: Compare group means (e.g., men vs. women)
  • Two Numeric: Look for correlation (e.g., height vs. weight)
  • Two Categorical: Examine proportions across groups (e.g., gender vs. age group)

🔬 Statistical Tests & P-values

  • Use statistical tests to determine if sample observations likely represent true population trends
  • P-value: Probability of observing results if no effect exists
  • Alpha Value (commonly 0.05): Decision threshold to reject or fail to reject the null hypothesis

🚀 Common Statistical Tests

  • One Sample Proportion Test: Checks if a single categorical proportion differs from a known or hypothesized value
  • Chi-square Test: Examines relationships between two categorical variables (e.g., gender vs. age group)
  • T-test: Compares mean(s) of numeric data; can be for one sample mean vs. a known value or two groups (e.g., men vs. women)
  • ANOVA (Analysis of Variance): Compares means across more than two groups/categories
  • Correlation Test: Assesses the strength and direction of a relationship between two numeric variables

💡 Formulating Hypotheses

  • Null Hypothesis (H₀): No effect or relationship
  • Alternative Hypothesis (H₁): Effect or relationship present
  • Avoid “data mining” by defining research questions in advance

🌀 Correlation Coefficient

  • Ranges from -1 to +1
  • 0 indicates no linear relationship; ±1 indicates perfect linearity
  • Quantifies how two numeric variables move together

✅ Conclusion

  • Statistics as a practical tool for exploring data and making inferences
  • Key Takeaway: Understand variables, pose clear hypotheses, use proper tests
  • “It’s not good science to...randomly stab around...hoping to find something statistically significant.”

Statistics is about "real" differences and relationships

By Carlos Mendez

Statistics is about "real" differences and relationships

  • 65