Welcome Back!

How are you doing?

 

 

Please fill out today’s survey in

#reporting-ii-2024

 

Today

Homework Review

✏️ Diagnostic Data Assignment 

✏️ Reading NYC Data Stories

 

Today

- 🗣️ Intro Stats for Journalists

 

Discussion

✏️ Investigative Pitch

 

 

Deadlines Policy:

Maximize learning, Minimize Stress

Homework Review

✏️ Diagnostic Data Assignment

Quick Calc 🧮

This is not "the answer". It is one quick back of the envelope calculation. I can show you a more thorough one next week.

Quick Calc 🧮

  • What are some methodological choices I made?
  • What are the assumptions and implications in my choices?
  • Did you make different ones?

✏️ Reading NYC Data Stories

Not All Numbers Are Created Equal

Learning Objectives

✔️ There are different kinds of numbers that must be interpreted and presented differently

 

✔️ Identify methodological choices are made when creating numbers and methodological choices that you make when presenting numbers

 

✔️ identify the universe of data that your numbers come from and know how the data is collected at the row-level

Not All Numbers Are Created Equal

Counts and Measurements

Summary Statistics

Probabilities

Inferential Statistics

Indexes

Scaled Numbers

I'm sure these aren't the only kinds of numbers...

...but they're all very different from one another

 

 

 

more methodological choices to vet

  • your income this year
    (count)
     
  • the average graduating salary of everyone
    in the last class
    (summary statistic - descriptive)
     
  • the average income of all Americans
    (summary statistic - inferential)
     
  • average income of a journalist anywhere in the world
    (index)

Methodological Choices

...carry the weight of human judgement and can be

  1. built into the data
  2. introduced by intermediaries
  3. introduced by you

Before you cite a number:

  • What is the original source of this number?
     
  • What kind of number is it?
    • a count? a summary? a prediction? an estimate? an inferential statistic?
       
  • How was it acquired or calculated? What is the universe of data that it comes from?
    ​​​​​​​
  • What methodological choices were in inherent in 
    • creating the number
    • selecting the number to be presented
  • What methodological choices are you making by finding/selecting/calculating/presenting this number?

p.s. - check out this neat thing I built using LangChain 🦜🔗

https://github.com/dmil/numlock-nlp

Intro to Stats for Journalists

Summary Stats

👻 Scary Math Symbol?

👍 Seems fine! 

Mean

 

 

 

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Weighted Average

 

 

Text

Example

Median

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Median Grade: 80%

Median Grade: 80%

Median Grade: 80%

Mode

Examples

 

 

 

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Range

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Correlation

https://www.investopedia.com/terms/n/negative-correlation.asp

Correlation

Pearson's Correlation Coefficient

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

https://m.xkcd.com/552/

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

Correlation & Causation

 

Standard Deviation

 

 

 

 

 

 

Variance

 

 

 

 

 

 

 

Examples

 

 

  

Examples

 

 

  

Ben Orlin — Math with bad drawings

Ben Orlin — Math with bad drawings

✏️ Mystery Data

 

Learning Objectives

✔️ Understanding summary statistics

✔️ Use formulas and pivot tables in spreadsheets

Mystery Data

https://docs.google.com/spreadsheets/d/1ObVYCOeTgGK_n9rhVFG05I-VIjzriO-F6mCDIpdd_lo/edit#gid=0​

Describe 4 mystery datasets with summary statistics...what can you tell me about these mystery data?

 

 

Calculate:

- Mean

- Median

- Mode

- Correlation

- Variance

Do it twice

 

- Use formulas on WIDE tab

 

- Use pivot tables on LONG tab

Done? Help out a classmate...or poke around the BONUS tab. How might you describe that data?

Anascombe's Quartet

Distributions

https://fivethirtyeight.com/features/al-gores-new-movie-exposes-the-big-flaw-in-online-movie-ratings/

Normal Distribution

Pay Attention To Distribution Of Data

Plotting

Exploratory Data Visualization

Pearson correlation is ????

Pearson correlation is 0.9909

because there are 40 duplicate data pts in top right and bottom left corner

Editorial Choices

 

1977

📚 A Hypothesis Is A Liability

Read and annotate using Hypothes.is

https://dmil.notion.site/A-Hypothesis-Is-a-Liability-5f8ccf30771042fabbf76d842ef99a8c?pvs=4


✏️ Exploring NYC Data

Your chance to explore and do "night science", no pressure to form a hypothesis, just #inspo.

  • Learn about NYC data
  • Practice formulas and pivot tables
  • Conduct an open-ended interview

https://dmil.notion.site/Exploring-NYC-Data-3bed8f11bbf1455e811f940b573e05e7?pvs=4

✏️ Data Story Pitch

  • Pitch Your Own (now)
  • Option to Work In Pairs (Oct / Nov)

✏️ Data Story Pitch

  • Pitch Your Own (now)
  • Option to Work In Pairs (Oct / Nov)

Intro Stats

By Dhrumil Mehta

Intro Stats

intro stats

  • 157