How to talk with a
data scientist
re:Clojure 2021 workshop series // 2021-11-13
J Santiago
Say hello, ask questions, learn, share, and be kind like with any other human being. Done!
How to talk with a data scientist
think like


The anti-fraud team is hiring!
FinTech from Berlin
- 120 employees
- Liquidity solutions for SMEs
- Buy now, pay later for B2B online shops
- Raised $100M in Series C

There are no stupid questions π
-
What don't you know about data science?
- Core (data) principles (according to me Β―\_(γ)_/Β―)
- Group problem-solving
Agenda
What do you want to know?
Question
Model
Analysis
maybe
Common terms
- Training
- Model
- Metrics
π¨βπ¦°π©π½βπ¦±π§πΌπ§βπ¦²π§ββοΈπ³π§π½βπ¦±π¨πΌ
Who is a zombie?
π§ββοΈπ§ββοΈπ§ββοΈπ§ββοΈπ§π½βπ¦±π¨πΌπ§π½βπ¦±π¨πΌ
Training data
Model features
- Heart-rate
- Words per second
π§βπ¦²
"78% probability this is a zombie"
{amount: 122
Β n-paid: null
Β color: "blue"}
Model
Data
PROD
TRAINING
| amount | type | color |
|---|---|---|
| 122 | 0 | "blue" |
{amount: 122
Β n-paid: 0
Β color: "blue"}
core (DATA) principles
for a successful data science project
Immutable &
Traceable &
specific
(... and documented, documented, documented!)

Data Scientist




| user_id | day | volume |
|---|---|---|
| 1 | 2021-11-02 | 1200 |
| 2 | 2021-11-02 | 1300 |
| 3 | 2021-11-02 | 600 |
| 4 | 2021-11-02 | 2100 |
Data is immutable
Mutating data invariably leads to loss of information. Not recording what mutation occurred leads to chaos π±.
Stored data must be an exact replica of what happens in a live system.
β οΈβ οΈβ οΈβ οΈ
| user_id | timestamp | volume |
|---|---|---|
| 1 | 2021-11-01 12:34:00 | 500 |
| 1 | 2021-11-01 15:10:07 | 500 |
| 1 | 2021-11-01 20:01:00 | 500 |
| 2 | 2021-11-01 10:15:00 | 100 |
| 2 | 2021-11-01 13:28:37 | 500 |
| 2 | 2021-11-01 21:59:19 | 500 |
| 3 | 2021-11-01 07:30:00 | 300 |
| 3 | 2021-11-01 12:34:00 | 500 |
No more mutation π
- When did the user drink?
- How much did they drink at lunch?
- When did they use the app the most?
Data is traceable
Events or entities created together, are linked together. Changes to data are events!
Β
| user_id | timestamp | volume | session |
|---|---|---|---|
| 1 | 2021-11-01 12:34:00 | 500 | 3 |
| 1 | 2021-11-01 15:10:07 | 500 | 4 |
| 1 | 2021-11-01 20:01:00 | 500 | 10 |
| 2 | 2021-11-01 10:15:00 | 100 | 1 |
Data is Specific
Good domain modelling is critical to fetch data specifically for what is needed.
| user_id | timestamp | volume | session | beverage |
|---|---|---|---|---|
| 1 | 2021-11-01 12:34:00 | 500 | 3 | water |
| 1 | 2021-11-01 15:10:07 | 500 | 4 | water |
| 1 | 2021-11-01 20:01:00 | 500 | 10 | water |
| 2 | 2021-11-01 10:15:00 | 100 | 1 | coffee |
Group WORK
https://docs.google.com/document/d/1QZMOPuLbe2eJBJgDaj_opg1JTYte7T7BOlWQ_ToCa10
English is the most powerful programming language.
- Paulus Esterhazy
Find me at http://jcpsantiago.xyz
and as @jcpsantiago almost everywhere else
Thank you!π
re:Clojure 2021 How to talk with a data scientist
By jcpsantiago
re:Clojure 2021 How to talk with a data scientist
- 39