Statistics for hackers

(resampling)

OP - https://www.youtube.com/watch?v=Iq9DzN6mvYA&t=1s

If  you can write a for loop,

 

you can do statistics.

And you should...

stitchdata.com/resources/reports/the-state-of-data-engineering

"  The number of data engineers more than doubled from 2013-2015.  "

"   Today, there are 6,500 people on LinkedIn who call themselves data engineers.


In San Francisco alone, there are 6,600 job listings for this same title.    "

"  42% of data engineers graduated from a Software Engineering role.   "

"Think about the relationship between designers and front-end developers, One comes up with the ideas, the other implements. And it can cause a lot of tension."
-
Ryan Orban, Galvanize CTO

Asking the right question.

questions for statistics

How old are the visitors to our site?

In July, what was the average age of a site visitor?

Did a change to our site increase the likelihood people younger than 20 would visit the site? or not bounce?

Does a change to our site correspond with an increase the likelihood people younger than 20 would visit the site? or not bounce? What are the chances that was random?

Is there an age group that has stopped using our site?

Why do we care about how old our site visitors are?

correlation =/= causation

  1. Strong relationship
  2. Strong research design
  3. Temporal relationship
  4. Dose-response relationship
  5. Reversible association
  6. Consistency
  7.  Plausibility
  8. Coherence with known facts.

Sample Bias

(self-selection bias)

Convenience sample

Spillovers

Attrition

Is this a fair coin?

(terminal time)

geomotry:GIS -- probability:statistics

The point about the coin

Sneeches

(terminal time)

Bayesians vs Frequentists

model is fixed, data varies

data is fixed, model varies

Statistics for hackers

By Vincent Buscarello

Statistics for hackers

  • 245