Common Paths into Data Science

There are a couple common journeys to get here

Today's Tour

  • Intro
  • PhD
  • Engineering
  • Analytics

Key Takeaways

  • Find data scientists with a similar background to yours and listen to their advice
  • Empathy for other data scientists

Background

  • Description of job titles
  • Biases
  • Valid paths

ML Engineer vs. Data Scientist?

  • *no widely accepted definitions- just my own opinion here
  • Analysts
    • Understand patterns and trends in data
    • Generally use data to tell stories and drive business outcomes
  • ​Data Scientists
    • ​Handle larger amounts of data
    • Sit anywhere on the spectrum from analyst to engineer
    • Oftentimes do machine learning
  • ​Machine Learning Engineers
    • ​Anywhere from academic to deploying production code
    • Use machine learning algorithms to put data to use
  • ​Data Engineers
    • ​Often handle the data pipeline- ingesting data, cleaning it, storing it, moving it, formatting it, etc. 

We all have biases

  • If you ask a data scientist how to get into data science, they'll probably give you an answer that looks an awful lot like their own background
    • I sure do- most of my advice is best suited for people with engineering backgrounds
  • Don't be intimidated by people with different backgrounds giving advice that seems super alien to you
  • Be aware of your own biases- be empathetic to others who have different backgrounds and recognize that yours is not the only useful skillset

All of these are valid paths

  • Each job has different requirements, so check out the job listing and see if your skill set fits with this particular role

What does each background look like?

PhD/Academic Background

  • Among the most vocal
  • Only make up 15% of all data scientists & machine learning engineers
  • Not required to actually finish their PhD to fall into this category
  • Chose industry over academia because:
    • More direct impact (code is used by millions of users, as opposed to published and read by a few)
    • More control over outcomes (academia is tough, and there are many factors outside your control)
    • Better earning potential- rarely cited as a primary reason
  • Typically study some technical field (Physics, Economics, Math, CS, etc.), or became the quantitative person within their field (computational biology, modeling neurophysics, etc.)

PhD/Academic Strengths

  • Reading academic journal articles
  • Developing entirely new machine learning algorithms
  • Taking the latest advances from academia and implementing them
  • Statistics
  • Study design
  • Familiarity with certain analytics packages or field-specific software languages/packages
  • Building models 

Analyst Background

  • Used to translating data into something non-technical executives can understand
  • Probably pretty used to building slide decks and presenting to clients
  • At some point, were probably frustrated that they were limited by their current tools (often Excel & SQL)
  • Intrigued to make sense of data and see what it means
  • Normally studied something business-related in undergrad
  • Probably used to being among the more technically minded of their peers (their peer group being relatively non-techincal compared to data scientists)

Analyst Strengths

  • Communication and polish- they're used to presenting to clients (oftentimes in a consultant role)
  • Business understanding, and focus on business impact
  • Drawing patterns from the data, telling stories with it, making sense of the trends they see in the data
  • Connecting the dots
  • Data visualization (in Excel or Tableau, not D3 or R)

Engineer Background

  • Used to writing code, oftentimes with a back-end focus
  • Frequently knows several languages, and can pick up new languages and libraries rapidly
  • Used to engineering workflows (scrum, GitHub, code reviews, etc.)
  • Typically uses the precise language of engineers
  • In the current environment, has a lot of job options
  • Likes solving puzzles
  • Experienced at debugging and approaches issues with a debugging mindset
  • Takes an MVP approach (Minimum Viable Product)
  • Likes shipping code to production 
  • Rarely implements an ML algo from scratch themselves- much more efficient to use an existing library

Engineer Strengths

  • Writing production code
  • Debugging issues
  • Shipping code and iterating rapidly
  • Logical thinking and technical communication
  • Using libraries to get projects shipped quickly
  • Maintaining & scaling code
  • Staying focused on delivering working features
  • Interacting with engineering teams
  • Engineering practices (GitHub, code review, etc.)
  • Visualization (using D3 and other libraries)
  • Deployment, database work, interacting with APIs, load balancing, and all the other tasks you'd normally expect an engineer to be able to do

Which one's best?!

  • Whichever one you need!
  • An engineer will tell you to hire an engineer because they will get the code up and running efficiently
  • An academic will tell you to hire an academic because they can prove that their models are accurate and stay up with the latest papers
  • An analyst will tell you to hire an analyst because they can make sense of the data and drive business outcomes/communicate findings outwards

Which path should I follow?

  • Whichever one you're drawn to!
  • Seriously, they're all great, and with dedicated effort, you can pursue any of these paths
  • Some of them take longer than others (a PhD takes 4-8 years, while an engineering or data science bootcamp takes 3-9 months)
  • Or just go with whatever your background already is
  • If you're an engineer, buff up on your statistics a bit, but don't worry that the academics will know far more stats than you will
  • If you're an academic, practice your code a bit, but don't worry that the engineers will be better practiced at shipping code
  • ...

Most importantly

Just practice as much as you can! 

 

  • Take on an extra project at work
  • Join a team for a Kaggle competition
  • Contribute to open source projects

Gratitude

These are all amazing positions to hold.

People from all fields have done a ton of work to allow all of us to progress more rapidly. We all depend upon each other, and push each other to ever greater heights.

Much of this has only become possible in the past few years.

Data holds tremendous power, and we are the people wielding that power- wield it with empathy.

Most sincerely, thank you for driving progress.

Common Paths Into Data Science

By Preston Parry

Common Paths Into Data Science

Descriptions of different paths into data science and machine learning.

  • 1,021