Data Summit Brussels
Diegem - 19 October 2017
Programme
Data wrangling with Talend
Exploring data with R and ggplot2
ElasticSearch for data science
Break
Predictive analytics with RapidMiner
Practical Machine Learning with H2O
Graph visualization with Linkurious
Comparing notebooks for data science
Wrap up
Drink
Karen Van Hellemont
Sarah Hosking
Paul HermansΒ
Β
Frederik Tijhuis
Darren Cook
Jean Villedieu
Karlijn Willems
14:00 - 14:25
14:25 - 14:55
14:55 - 15:20
15:20 - 15:40
15:40 - 16:05
16:05 - 16:30
16:30 - 16:55
16:55 - 17:20
17:20 - 17:30
17:30Β
Wifi: Hotel Brussels Airport
#DSBXL17
Maarten Lambrechts
Data science friction
SAI Data Summit
How to smooth your pipeline
19 October 2017
@maartenzam
We
Machine
Interface
At every interface between 2 surfaces, friction consumes energy, produces heat and wears down moving parts.
Data friction is what happens between 'data surfaces': where data moves between people, substrates, organisations or machines. From one lab to another, from one discipline to another, from a sensor to a computer, or from one data format to another.
Every movement of data across an interface comes at a cost of time, energy and human attention.
Know what goes into your pipeline
Go out in the field
Metadata
Documentation
Why build a pipeline in the first place?
1. Speed and time saving
- Speed and time saving
- Humans out of the loop: less errors, independent of individuals
- Better understanding: all the data processing jobs explicitly written down
- Debugging
Single tool pipeline?
πππ
Data connectors
API's
Export & import
Multitool pipeline
Develop
Maintain
What should be the next piece of tube in your pipeline?
Some tools will help you
Know the jargon
Time = β¬
Get's you out of the flow
Coding friction
compiling
building
reloading
calling servers
No immediate feedback
Analysis
Β
Slicing and dicing
Adjust parameters
Views and visualizations
Modelling and predictive analytics
Go the last mile
Building the pipe is not the goal,
providing insights and making informed decisions is
Communicate what you found
(and how you found it)
Consider the user!
Use the right tools & techniques
=LEFT((RIGHT((RIGHT(A2,LEN(A2)-FIND(β;β,A2))),LEN((RIGHT(A2,LEN(A2)-FIND(β;β,A2))))-FIND(β;β,(RIGHT(A2,LEN(A2)-FIND(β;β,A2)))))),FIND(β;β,(RIGHT((RIGHT(A2,LEN(A2)-FIND(β;β,A2))),LEN((RIGHT(A2,LEN(A2)-FIND(β;β,A2))))-FIND(β;β,(RIGHT(A2,LEN(A2)-FIND(β;β,A2)))))),1)-1)
Learning a new tool = friction
Know what goes into your pipe
Pipes save time & decrease errors, but consider ROI
Tool interfaces can be painful
Tools can help to build a pipe, but you need to know the jargon
Avoid latency, get immediate feedback when developing a pipe
Iterate quickly in analysis
Go the laste mile, consider the user
Use the right tools and techniques
Consider the ROI on learning a new tool
Β
Smooth data pipelines
Programme
Data wrangling with Talend
Exploring data with R and ggplot2
Data intelligence with ElasticSearch
Break
Predictive analytics with RapidMiner
Practical Machine Learning with H2O
Graph visualization with Linkurious
Comparing notebooks for data science
Wrap up
Drink
Karen Van Hellemont
Sarah Hosking
Β
Β
Frederik Tijhuis
Darren Cook
Jean Villedieu
Karlijn Willems
14:00 - 14:25
14:25 - 14:55
14:55 - 15:20
15:20 - 15:40
15:40 - 16:05
16:05 - 16:30
16:30 - 16:55
16:55 - 17:20
17:20 - 17:30
17:30Β
Friction in the data pipeline
By maartenzam
Friction in the data pipeline
- 2,946