data-centric apps

problems to solve

Piotr Lewandowski

@constjs

but what is data-centric?

  • architecture
  • database plays crucial role
  • SQL / R / MatLab

I want to build an app...

... that will look for

   productive music

we need data

  • aggregation
  • analysis
  • visualization

Aggregation

Step 1

  • fail safe
  • repeatable
  • independent

Data providers

last.fm API

RescueTime

  • music I listen
  • tracks & albums data
    • duration
    • categories
  • activity I do
  • classify productivity

Data providers

problems

  • Limits
  • Authentication problems
  • Timeouts
    • Remember to retry request
  • Broken data

Copy all the data

to your database

aggregation is long process

no reason to do it twice

data analysis

Step 2

Data fusion

Proces of integration data from multiple sources

Matching

Activities

Music

  • 10:05 - IDE
  • 10:19  - Facebook
  • 10:27 - YouTube
  • 10:37 - IDE
  • 10:03 - Jazz
  • 10:09 - Rock
  • 10:17  - Soul
  • 10:24 - Smooth
  • 10:33 - Rock

Prediction

machine learning

  • Lots of data
  • Problem with validation data
  • Asking the right question
    • Garbage input - Garbage output

Tools

Python FTW

  • designed for network apps
  • event loop blocks heavy tasks
  • lack of community support

Python libs

  • NumPy - matrix calculation
    • SIMD - Single Input Multiple Data
    • O(1) vs O(n^2)
  • pandas - data structure
  • scikit-learn - Machine learning alghoritms
  • matplotlib - graphs and plots

App flow

Aggregation

and data fusion

Analysis

Visualisation

Node.js

scikit-learn
(python)

chart / tables
(JavaScript)

database

last.fm  + RescueTime

Visualisation

Step 3

It's not about excel anymore

Don't make people think

Start with the story

One dataset

 many stories

There are three kinds of lies: lies, damned lies, and statistics

- popularised by Mark Twain

Do we need a team?

  • Data scientist
  • Developer
  • Graphic designer

Student

Nope.

Data-centrict apps: Problems to solve

By Piotr Lewandowski

Data-centrict apps: Problems to solve

I described some problems that I encountered on building apps highly based on aggregation and analysis.

  • 389