DS minus ML:

metro analytics

and vizualization

Andrey Lukyanenko

About me

  • ~4 years as ERP-system consultant
  • self-study for switching career
  • DS since 2017
  • NLP TechLead in medical chat-bot project

Content

  • Why machine learning isn't always necessary and possible
  • Project description
  • Solution
  • Metro visualization

ML isn't always useful

  • Not enough data

ML isn't always useful

  • Not enough data
  • Too much data and not developed infrastructure

ML isn't always useful

  • Not enough data
  • Too much data and not developed infrastructure
  • You may not have labelled data and clustering doesn't always help

ML isn't always useful

  • Not enough data
  • Too much data and not developed infrastructure
  • You may not have labelled data and clustering doesn't always help
  • Baseline work well enough

ML isn't always useful

  • Not enough data
  • Too much data and not developed infrastructure
  • You may not have labelled data and clustering doesn't always help
  • Baseline work well enough
  • DS isn't only about models

ML isn't always useful

  • Not enough data
  • Too much data and not developed infrastructure
  • You may not have labelled data and clustering doesn't always help
  • Baseline work well enough
  • DS isn't only about models
  • If-else forever

Project description

Data: logs of wifi-connections

Idea: predict user characteristics

Action: show relevant ads

???

Profit!

Data

  • Logs from database
  • Database is limited
  • Station coordinates from wiki
  • Get data about travel time between stations

Process of solving

  • folium visualization doesn't help
  • Analyze visited stations
  • Get real station visits
  • Analyse routes within a day
  • Understand real routes
  • Morning/evening, weekdays/weekends
  • Nearby stations
  • Some people visit only a single station

Solution

  • Derive real routes
  • Split them into categories
  • Find most common routes
  • Filter irrelevant
  • Calculate visits to important stations
  • Threshold, confidence
  • 60% accuracy. Errors when not enough data

Visualization

https://varf.ru/metro3d/?p=-90&t=45&d=41.05255888325765

Examples of visualization

Examples of visualization

Examples of visualization

Contacts