Data visualization in Data Science: theory and practice
Andrey Lukyanenko
Senior DS @ Careem
data:image/s3,"s3://crabby-images/2c485/2c485fe87cfb8f9b0d8dee587c70ca2f048d2812" alt=""
Content
- What is data visualization and why is it useful?
- Classics of data visualization
- Common software for making the plots
- Chart types
- Skills for professional-level visualizations
- What should and shouldn't you do
- Data Visualization for Machine Learning
data:image/s3,"s3://crabby-images/72518/725184192ca8e4d126a73e168475eda333ce9eb1" alt=""
Why visualizations are important
data:image/s3,"s3://crabby-images/1889b/1889be684b45e553d9f94f2b7e69eb8cc6cfa4af" alt=""
data:image/s3,"s3://crabby-images/2b223/2b223112d9f3dbdb331e290ada72d6a2c7f146ab" alt=""
Why visualizations are important
data:image/s3,"s3://crabby-images/4c4cd/4c4cdb67f1b95aa61adff05891c9b623003ff9e3" alt=""
data:image/s3,"s3://crabby-images/5a85e/5a85efcd877e9d9e1b6b758601d6534beb036de7" alt=""
data:image/s3,"s3://crabby-images/ef60d/ef60d6dc91ba599449b71e056efe61c6ba9cee5d" alt=""
data:image/s3,"s3://crabby-images/72f3d/72f3d64aa93d7767b5320bf1a201fe49e91b171a" alt=""
data:image/s3,"s3://crabby-images/fecf9/fecf92d593039b1600d5e29f216b13518c9ce4df" alt=""
Tufte
- Maximize data ink-ratio
- Minimize lie factor
- Minimize chartjunk
- Use proper scales and clear labeling
Python. Matplotlib
data:image/s3,"s3://crabby-images/2f0a7/2f0a727691b14f03f6384cdd31713975d7b42b13" alt=""
Python. Plotly
data:image/s3,"s3://crabby-images/e983b/e983b5d00430d59d601f8ae7c110594166dddacf" alt=""
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['AAPL.Open'],
high=df['AAPL.High'],
low=df['AAPL.Low'],
close=df['AAPL.Close'])])
fig.show()
Python. Altair
data:image/s3,"s3://crabby-images/6002f/6002f424bca06c4350595853d1b02c9ea87e19b7" alt=""
import altair as alt
from vega_datasets import data
source = data.cars()
alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()
R
data:image/s3,"s3://crabby-images/59163/59163428ca637fb46de63a01e472f2e04e9fd4df" alt=""
library(ggplot2)
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point()
Tableau
data:image/s3,"s3://crabby-images/a23ec/a23ecfe6452e77393875165d9d3f4d93c4285739" alt=""
D3.js
data:image/s3,"s3://crabby-images/1a93f/1a93fae4b685ee5a49f04a168e774c34a147fa15" alt=""
data:image/s3,"s3://crabby-images/04563/0456376cf6eddeeed57e616c0a44e899ce206441" alt=""
data:image/s3,"s3://crabby-images/d5f01/d5f01dac47ce089916751706fd1731f6e58ef2cd" alt=""
data:image/s3,"s3://crabby-images/8e013/8e0135453b3ded6197e2d5962fc86ecb65e28121" alt=""
data:image/s3,"s3://crabby-images/476f0/476f0e9e86cd1150143556970f3ee09c7844c511" alt=""
data:image/s3,"s3://crabby-images/a7b81/a7b81c6489fcd938dacc439e3982d68aa8d373ff" alt=""
Skills
- Data visualization
- Graphical design, interface design, UX
- Storytelling
- Color: Hue, Value/Lightness, Saturation/Chroma, RGB vs CMYK
- Accessibility
- Style guide
Style guide
data:image/s3,"s3://crabby-images/48992/4899295ebed467e2275c9593087baa369aea465a" alt=""
data:image/s3,"s3://crabby-images/ddc72/ddc726585ebdfe726cdd4612b77b8580742a8ee3" alt=""
Improvements
data:image/s3,"s3://crabby-images/582f3/582f34f49f14b11e75c0f610ce98ac80209faa51" alt=""
Improvements
data:image/s3,"s3://crabby-images/51895/5189584d083675957ecd91729e2e423931150023" alt=""
Improvements
data:image/s3,"s3://crabby-images/01650/016504d77289aaa4d1175c641808deebf929ef74" alt=""
Improvements
data:image/s3,"s3://crabby-images/756d9/756d9e8133f77a22b63123b66d8af883c4d00b4d" alt=""
Improvements
data:image/s3,"s3://crabby-images/64894/64894e28ceaafc8d44441214e5e6aebd0bd4afab" alt=""
Improvements
data:image/s3,"s3://crabby-images/2e546/2e546334dfa71de4c27da0ed59c909ac80479a91" alt=""
Improvements
data:image/s3,"s3://crabby-images/fa402/fa402c7cd87375586c62ed0a9906d72ce9f7587e" alt=""
Improvements
data:image/s3,"s3://crabby-images/01055/01055b832d830e2940f1bda29b25e23f83b63838" alt=""
- Exploratory vs explanatory, показать то, что важно
- Pie, 3D
data:image/s3,"s3://crabby-images/e6dcb/e6dcbfc14ccae1a66dda2d2cf9071651bc6305ca" alt=""
data:image/s3,"s3://crabby-images/48d13/48d13935c2f20a936da8627a6e1022039b673381" alt=""
data:image/s3,"s3://crabby-images/74c58/74c580b9447fc04f3faa2195d7a8ea3318856254" alt=""
data:image/s3,"s3://crabby-images/6abe7/6abe7bb1d58a9944cd7df412d6dee775581c637e" alt=""
Good examples
data:image/s3,"s3://crabby-images/7bfb9/7bfb9ad216eae3513283f4332b9b372810b90f69" alt=""
data:image/s3,"s3://crabby-images/765dc/765dc0103ef46e62e824e8edec8f21cc7be3b61f" alt=""
import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.pairplot(penguins, hue="species")
data:image/s3,"s3://crabby-images/10aba/10aba4b1ede59d43a11074a8dd4c966294174d24" alt=""
data:image/s3,"s3://crabby-images/0165d/0165d4ce97870026b8e159d90ab169063ae50cd5" alt=""
import shap
explainer = shap.TreeExplainer(model1, X_train)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train)
data:image/s3,"s3://crabby-images/ddbf9/ddbf901b58307756922f01687f6cdc746b3ee321" alt=""
data:image/s3,"s3://crabby-images/c3aee/c3aeeb0a1e2cb46359f3a781b66c1a9949272eca" alt=""
data:image/s3,"s3://crabby-images/6069b/6069bcf4dc91c6d345e7f6f43e7ed5f5e49c76a4" alt=""
data:image/s3,"s3://crabby-images/3cf28/3cf281fa2071073031b580362b9a4e323ca2ab89" alt=""
data:image/s3,"s3://crabby-images/3886d/3886d6b891ed46fc1e0adf77e8df48a66ebe2b82" alt=""
data:image/s3,"s3://crabby-images/55bbe/55bbec91c13cb609f0bcaad038235bdc1ac1faeb" alt=""
data:image/s3,"s3://crabby-images/5286e/5286ec82aa8e7383bc7d1aacd75412ed1f8b895a" alt=""
References
- https://www.anychart.com/chartopedia/usage-type/
- https://gramener.github.io/visual-vocabulary-vega/#
- https://www.python-graph-gallery.com/
- https://raw.githubusercontent.com/Financial-Times/chart-doctor/main/visual-vocabulary/poster.png
- https://textvis.lnu.se/
- https://www.data-to-viz.com/
- https://www.datavisualizationsociety.org/
References
-
https://towardsdatascience.com/data-visualization-in-data-science-5681cbdde5bf
-
https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
-
https://revealthedata.com/blog/all/navyki-dlya-vizualizacii-dannyh-i-dataviz-inzhenerov/
-
https://docs.google.com/document/d/1SUZ_DXt_WXPlv1IfqlrwTa00S7b3ghJ_2o_W4yE6_UA/edit#
-
https://blog.datawrapper.de/better-charts/
References
-
https://www.kaggle.com/spitfire2nd/enthusiast-to-data-professional-what-changes/
-
https://medium.com/nightingale/style-guidelines-92ebe166addc
-
https://www.tableau.com/learn/articles/best-data-visualization-blogs
-
https://boostlabs.com/blog/10-types-of-data-visualization-tools/
-
https://github.com/lutzroeder/netron
References
References
Contacts
Data Visualization talk
By Andrey Lukyanenko
Data Visualization talk
- 219