Explaining machineJS to My Parents Part 1:

What is Machine Learning?

Today's Tour

  • What is machine learning?
  • Why would we want to use machine learning?
  • Examples of machine learning

What Is Machine Learning?

  • Taking some basic tenets of statistics and probability (math stuff you learned in high school), and making them useful for complicated datasets.
  • It's the typical lego approach- take really basic materials (each lego block is really simple on it's own, just as these principles of probability and statistics are relatively simple on their own) and assemble them together in complex ways to make something much more useful.
  • (cont. below)

What Is Machine Learning (cont.)

  • A simple example:
  • Using square feet to predict house prices
  • Simply put a bunch of points on a graph, and draw a line through them
  • The machine will draw the best line through them, minimizing the total distance between each point and the line- that's machine learning!

Why Use Machine Learning?

  • The house price example we used is a very basic one
  • What happens if we have a much more complex data set?
  • For example, house prices probably depend on a lot more than just square footage:
    • Location!
    • Year the house was built
    • Year the house was remodeled
    • Attached garage?
    • School district
    • Distance to public transit
    • ...

Why Use Machine Learning (cont.)

  • Humans can understand these complex relationships up to a point, but after a certain size or complexity there's just too much for us to make sense of
  • A machine can learn the patterns in a dataset of millions of home sales in a matter of minutes or hours
  • A human would take a lifetime to learn the patterns in just a few thousand home sales
  • And a machine can do it much more accurately
    • A human can figure out that being in a certain school district helps, but since all homes in that district are larger, we can't figure out exactly how much it helps; the machine can.

Why Use Machine Learning (cont.)

  • Ultimately, machine learning is better than human learning at some problems for three primary reasons:
    • Speed
      • Learning datasets of millions of rows in a matter of minutes or hours
    • Accuracy
      • Learning the relationship between different factors (school district, home size, year built, etc.) very specifically
    • Explainability
      • The machine can tell you exactly how each factor contributed to the final result

What Does Machine Learning Do?

  • "Cool, so it learned what matters for house prices. Now what?"
  • Make predictions! 
  • The point of machine learning can be either just to learn a dataset so we can understand it better,or to put that learning to use by making predictions on new data points not in the learning data
  • In other words, given all that we've learned about house prices, let's see which homes on the market now are under or over-valued
  • Or, if we're debating whether to sell our home, how much should we expect to sell it for?

What Does Machine Learning Do?

  • Since the machine has learned the patterns and relationships in the training data so well, it is going to be very good at making predictions on new data
  • Those new predictions allow the business to make good decisions on the things they're currently debating (do we buy this house for our portfolio, or that one?)

Real World Examples!

Dating Websites

  • Let's imagine a dating app
  • Even a simple dating probably has a fair bit of information about each user:
    • Age
    • Location
    • Religion
    • Occupation
    • Education
    • Smoking status
    • Phone Type (Android or iPhone)
    • Number of photos
    • How frequently other users like them
    • I can probably think of dozens more data points

Dating Websites

  • Understanding the relationship between all these different data points is tough!
  • Maybe smoking only matters if you're a non-smoker
  • And maybe that effect gets stronger as we age (you don't realize the full dangers of smoking when you're young and it might seem cooler)
  • But maybe it's less important for people in some professions who have a bunch of coworkers who smoke
  • Oh crap, but we haven't even thought about how religion or education or location factor into how people perceive smokers yet. 
  • Ok, well maybe people in Atlanta are more ok with smokers, but people in Denver...

Dating Websites

  • "Ok, so figuring out how all these different factors relate to each other is tough"
  • "But if we give all this info to a professional matchmaker, surely they can figure it out for us. After all, they've seen hundreds or maybe even a few thousand relationships"
  • But if we give this entire dataset, with millions and millions of successful and failed matches, to a machine, it can learn from everybody, not just a few thousand matches

Dating Websites

  • Most dating websites can't show you all their users at once. In fact, if we're on an app, they might only be able to show you one other user at a time
  • Out of their millions of users, how do they choose which one to show you that you're most likely to be intrigued by?
  • Machine learning!

Predicting Cancer

  • We're actually pretty bad at predicting who has cancer and who doesn't
  • Even with all the data we have, doctors oftentimes don't know if a growth is actually cancerous until they go in and biopsy it, which can be a very difficult thing to do oftentimes
  • Telling people they're at higher risk for cancer causes a lot of stress, and stress is of course very bad for your health, so we can't just tell everyone that they're at risk of cancer and should get a bunch more screening 
  • Human biology is complex! We still don't have it all figured out

Predicting Cancer

  • Again, machine learning can be useful
  • We can figure out the complex interplay of age and family history and personal history and height and weight and blood pressure and even the doctor's own subjective assessment of the patient, and give a more accurate prediction of whether this is a high risk patient or not
  • Again, the machine is figuring out the complex relationships between different data points

Legal Discovery

  • At the start of a big legal case, each legal team will be given hundreds of thousands of pages of documents that pertain to the case
  • In the past, this used to mean young lawyers (at $200/billable hour) combing through all these documents to try to find which ones might be important
    • This was not a particularly fun part of the job for most people...

Legal Discovery

  • Again, it's all pattern recognition!
  • "If person A wrote 10,000 emails, are any of them written in a very different style than the other emails?"
  • "If we have 3,000 Purchase Order documents, are any of them notably different than most of the others?"
  • "If we're only used to seeing Associates and Managers in emails discussing a certain type of loan approval, and all of a sudden we now have a Managing Director bcc'd on a similar email, does that signify anything?"
  • Machine learning has gotten surprisingly good at identifying which documents might be more interesting for humans to look into more, freeing the lawyers up to focus on higher-level work, not the drudge work

That's machine learning in a nutshell!

explaining machineJS to My Parents Part 1: What is Machine Learning?

By Preston Parry

explaining machineJS to My Parents Part 1: What is Machine Learning?

My parents are awesome. But they don't know much about machine learning. Yet.

  • 1,540