## Today

Homework Review

✏️ Diagnostic Data Assignment

- 🗣️ Intro Stats for Journalists

Discussion

✏️ Investigative Pitch

Maximize learning, Minimize Stress

# Homework Review

`https://columbianewsservice.com/2022/11/08/clubhouse-rules-new-yorks-new-young-republican-leader-eyes-the-future/`

# Quick Calc 🧮

`https://docs.google.com/spreadsheets/d/1FO-B3EEAmH3GVIuXG-_-WkyDYorvRvzt9qbZX1YFRMg/edit#gid=0`

This is not "the answer". It is one quick back of the envelope calculation. I can show you a more thorough one next week.

• What are some methodological choices I made?
• What are the assumptions and implications in my choices?
• Did you make different ones?

# Not All Numbers Are Created Equal

### Learning Objectives

✔️ There are different kinds of numbers that must be interpreted and presented differently

✔️ Identify methodological choices are made when creating numbers and methodological choices that you make when presenting numbers

✔️ identify the universe of data that your numbers come from and know how the data is collected at the row-level

# Not All Numbers Are Created Equal

Counts and Measurements

Summary Statistics

Probabilities

Inferential Statistics

Indexes

Scaled Numbers

more methodological choices to vet

(count)

• the average graduating salary of everyone
in the last class
(summary statistic - descriptive)

• the average income of all Americans
(summary statistic - inferential)

• average income of a journalist anywhere in the world
(index)

## Methodological Choices

...carry the weight of human judgement and can be

1. built into the data
2. introduced by intermediaries
3. introduced by you

## Before you cite a number:

• What is the original source of this number?

• What kind of number is it?
• a count? a summary? a prediction? an estimate? an inferential statistic?

• How was it acquired or calculated? What is the universe of data that it comes from?
• What methodological choices were in inherent in
• creating the number
• selecting the number to be presented
• What methodological choices are you making by finding/selecting/calculating/presenting this number?

p.s. - check out this neat thing I built using LangChain 🦜🔗

https://github.com/dmil/numlock-nlp

# Summary Stats

👻 Scary Math Symbol?

👍 Seems fine!

# Mean

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Median

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Mode

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Range

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Correlation

https://www.investopedia.com/terms/n/negative-correlation.asp

# Correlation

Pearson's Correlation Coefficient

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

https://m.xkcd.com/552/

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Variance

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# ✏️ Mystery Data

Learning Objectives

✔️ Understanding summary statistics

✔️ Use formulas and pivot tables in spreadsheets

# Mystery Data

Describe 4 mystery datasets with summary statistics...what can you tell me about these mystery data?

Calculate:

- Mean

- Median

- Mode

- Correlation

- Variance

Do it twice

- Use formulas on WIDE tab

- Use pivot tables on LONG tab

Done? Help out a classmate...or poke around the BONUS tab. How might you describe that data?

# Distributions

https://fivethirtyeight.com/features/al-gores-new-movie-exposes-the-big-flaw-in-online-movie-ratings/

# Plotting

Exploratory Data Visualization

Pearson correlation is ????

Pearson correlation is 0.9909

because there are 40 duplicate data pts in top right and bottom left corner

# Editorial Choices

1977

### 📚 A Hypothesis Is A Liability

https://dmil.notion.site/A-Hypothesis-Is-a-Liability-5f8ccf30771042fabbf76d842ef99a8c?pvs=4

### ✏️ Exploring NYC Data

Your chance to explore and do "night science", no pressure to form a hypothesis, just #inspo.

• Practice formulas and pivot tables
• Conduct an open-ended interview

https://dmil.notion.site/Exploring-NYC-Data-3bed8f11bbf1455e811f940b573e05e7?pvs=4

# ✏️ Data Story Pitch

• Option to Work In Pairs (Oct / Nov)