Data visualization in Data Science: theory and practice
Andrey Lukyanenko
Senior DS @ Careem
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/7725389/pasted-from-clipboard.png)
Content
- What is data visualization and why is it useful?
- Classics of data visualization
- Common software for making the plots
- Chart types
- Skills for professional-level visualizations
- What should and shouldn't you do
- Data Visualization for Machine Learning
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9049814/pasted-from-clipboard.png)
Why visualizations are important
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9047971/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9047972/pasted-from-clipboard.png)
Why visualizations are important
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9096720/pasted-from-clipboard.png)
![](https://miro.medium.com/max/600/1*W--cGoA3_n2ZlU6Xs4o2iQ.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9047980/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9047982/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9047985/pasted-from-clipboard.png)
Tufte
- Maximize data ink-ratio
- Minimize lie factor
- Minimize chartjunk
- Use proper scales and clear labeling
Python. Matplotlib
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9050031/pasted-from-clipboard.png)
Python. Plotly
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9050040/pasted-from-clipboard.png)
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['AAPL.Open'],
high=df['AAPL.High'],
low=df['AAPL.Low'],
close=df['AAPL.Close'])])
fig.show()
Python. Altair
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9050041/pasted-from-clipboard.png)
import altair as alt
from vega_datasets import data
source = data.cars()
alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()
R
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9050067/pasted-from-clipboard.png)
library(ggplot2)
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point()
Tableau
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9049900/pasted-from-clipboard.png)
D3.js
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9050069/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9097980/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9097982/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9097987/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9097989/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9097992/pasted-from-clipboard.png)
Skills
- Data visualization
- Graphical design, interface design, UX
- Storytelling
- Color: Hue, Value/Lightness, Saturation/Chroma, RGB vs CMYK
- Accessibility
- Style guide
Style guide
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048147/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048148/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048060/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048061/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048068/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048070/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048071/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048073/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048074/pasted-from-clipboard.png)
Improvements
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048075/pasted-from-clipboard.png)
- Exploratory vs explanatory, показать то, что важно
- Pie, 3D
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048066/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048067/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9049981/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048140/pasted-from-clipboard.png)
Good examples
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048141/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048191/pasted-from-clipboard.png)
import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.pairplot(penguins, hue="species")
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9049774/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9093910/pasted-from-clipboard.png)
import shap
explainer = shap.TreeExplainer(model1, X_train)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9104116/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9048195/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/6354049/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/6354068/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/6354071/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/6354073/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1047697/images/9049891/pasted-from-clipboard.png)
References
- https://www.anychart.com/chartopedia/usage-type/
- https://gramener.github.io/visual-vocabulary-vega/#
- https://www.python-graph-gallery.com/
- https://raw.githubusercontent.com/Financial-Times/chart-doctor/main/visual-vocabulary/poster.png
- https://textvis.lnu.se/
- https://www.data-to-viz.com/
- https://www.datavisualizationsociety.org/
References
-
https://towardsdatascience.com/data-visualization-in-data-science-5681cbdde5bf
-
https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
-
https://revealthedata.com/blog/all/navyki-dlya-vizualizacii-dannyh-i-dataviz-inzhenerov/
-
https://docs.google.com/document/d/1SUZ_DXt_WXPlv1IfqlrwTa00S7b3ghJ_2o_W4yE6_UA/edit#
-
https://blog.datawrapper.de/better-charts/
References
-
https://www.kaggle.com/spitfire2nd/enthusiast-to-data-professional-what-changes/
-
https://medium.com/nightingale/style-guidelines-92ebe166addc
-
https://www.tableau.com/learn/articles/best-data-visualization-blogs
-
https://boostlabs.com/blog/10-types-of-data-visualization-tools/
-
https://github.com/lutzroeder/netron
References
References
Contacts
Data Visualization talk
By Andrey Lukyanenko
Data Visualization talk
- 220