Welcome Back!

How are you doing?

Please fill out today’s survey in

#reporting-ii-2024

## Today

Homework Review

✏️ Diagnostic Data Assignment

Today

- 🗣️ Intro Stats for Journalists

Discussion

✏️ Investigative Pitch

Maximize learning, Minimize Stress

# Homework Review

✏️ Diagnostic Data Assignment

`https://columbianewsservice.com/2022/11/08/clubhouse-rules-new-yorks-new-young-republican-leader-eyes-the-future/`

# Quick Calc 🧮

`https://docs.google.com/spreadsheets/d/1FO-B3EEAmH3GVIuXG-_-WkyDYorvRvzt9qbZX1YFRMg/edit#gid=0`

This is not "the answer". It is one quick back of the envelope calculation. I can show you a more thorough one next week.

# Quick Calc 🧮

`https://docs.google.com/spreadsheets/d/1FO-B3EEAmH3GVIuXG-_-WkyDYorvRvzt9qbZX1YFRMg/edit#gid=0`
• What are some methodological choices I made?
• What are the assumptions and implications in my choices?
• Did you make different ones?

# Not All Numbers Are Created Equal

### Learning Objectives

✔️ There are different kinds of numbers that must be interpreted and presented differently

✔️ Identify methodological choices are made when creating numbers and methodological choices that you make when presenting numbers

✔️ identify the universe of data that your numbers come from and know how the data is collected at the row-level

# Not All Numbers Are Created Equal

Counts and Measurements

Summary Statistics

Probabilities

Inferential Statistics

Indexes

Scaled Numbers

I'm sure these aren't the only kinds of numbers...

...but they're all very different from one another

more methodological choices to vet

(count)

• the average graduating salary of everyone
in the last class
(summary statistic - descriptive)

• the average income of all Americans
(summary statistic - inferential)

• average income of a journalist anywhere in the world
(index)

## Methodological Choices

...carry the weight of human judgement and can be

1. built into the data
2. introduced by intermediaries
3. introduced by you

## Before you cite a number:

• What is the original source of this number?

• What kind of number is it?
• a count? a summary? a prediction? an estimate? an inferential statistic?

• How was it acquired or calculated? What is the universe of data that it comes from?
​​​​​​​
• What methodological choices were in inherent in
• creating the number
• selecting the number to be presented
• What methodological choices are you making by finding/selecting/calculating/presenting this number?

p.s. - check out this neat thing I built using LangChain 🦜🔗

https://github.com/dmil/numlock-nlp

# Summary Stats

👻 Scary Math Symbol?

👍 Seems fine!

# Mean

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Text

# Median

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Mode

Examples

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Range

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Correlation

https://www.investopedia.com/terms/n/negative-correlation.asp

# Correlation

Pearson's Correlation Coefficient

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

https://m.xkcd.com/552/

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# Variance

Examples

Examples

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

# ✏️ Mystery Data

Learning Objectives

✔️ Understanding summary statistics

✔️ Use formulas and pivot tables in spreadsheets

# Mystery Data

Describe 4 mystery datasets with summary statistics...what can you tell me about these mystery data?

Calculate:

- Mean

- Median

- Mode

- Correlation

- Variance

Do it twice

- Use formulas on WIDE tab

- Use pivot tables on LONG tab

Done? Help out a classmate...or poke around the BONUS tab. How might you describe that data?

# Distributions

https://fivethirtyeight.com/features/al-gores-new-movie-exposes-the-big-flaw-in-online-movie-ratings/

# Plotting

Exploratory Data Visualization

Pearson correlation is ????

Pearson correlation is 0.9909

because there are 40 duplicate data pts in top right and bottom left corner

# Editorial Choices

1977

### 📚 A Hypothesis Is A Liability

https://dmil.notion.site/A-Hypothesis-Is-a-Liability-5f8ccf30771042fabbf76d842ef99a8c?pvs=4

### ✏️ Exploring NYC Data

Your chance to explore and do "night science", no pressure to form a hypothesis, just #inspo.

• Practice formulas and pivot tables
• Conduct an open-ended interview

https://dmil.notion.site/Exploring-NYC-Data-3bed8f11bbf1455e811f940b573e05e7?pvs=4

# ✏️ Data Story Pitch

• Option to Work In Pairs (Oct / Nov)