Group E

Matplotlib is one of the most important packages for Data Visualization

 

  • One of the first packages : first release in 2003
  • originally developed by John D. Hunter
  • contains several noticeable sub.packages :

 

Seaborn

GGplot

Pyplot


import matplotlib.pyplot as plt
import seaborn as sns
import ggplot
from ggplot import diamonds




%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns
'Types of Plots :'

plt.hist()
plt.scatter()
plt.pie()
plt.bar()
...

Use the alias of pyplot : plt along with the type of plot you want

The Basic Commands in Matplotlib

plt.show()

-> 'Display the plot(s)'

plt.clf()

-> 'Clean the plot so that you can start fresh'
plt.scatter( 
x = diamonds['price'], 
y = diamonds['carat'])

plt.show()
plt.scatter(
x = diamonds.price,
y = diamonds.carat)

plt.show()

Call variables from a dataset

1. Using brackets

2. Using dots

plt.scatter(
x=diamonds.price,
y=diamonds.carat)
ggplot(data=diamonds) 
+ geom_point(aes(x=price, y=carat))
matplotlib.style.use('ggplot')
  • Customize matplotlib to your own taste
matplotlib.style.use('dark_background')
import matplotlib.pyplot as plt
import pandas as pd
df=pd.read_csv("/Users/anchaljaiswal/Downloads/diamonds.csv")
plt.scatter(x=df.carat,y=df.price)
plt.show()

Customization

We start with a simple scatter plot between Carat and Price of a diamond 

Now let's add axis labels and chart title to improve readability

plt.scatter(x=df.carat,y=df.price)
plt.xlabel("Carat")
plt.ylabel("Price")
plt.title("Diamonds")
plt.show()

Customization

We can also change the color and shape of the points in the graph

plt.scatter(x=df.carat,y=df.price,marker='2')
plt.xlabel("Carat")
plt.ylabel("Price")
plt.title("Diamonds")
plt.show()

Customization

Changing the shape

Customization

plt.scatter(x=df.carat,y=df.price,c='g',
marker='2')
plt.xlabel("Carat")
plt.ylabel("Price")
plt.title("Diamonds")
plt.show()

Changing the Color: Option1

plt.scatter(x=df.carat,y=df.price,c='#0000FF',
marker='2')
plt.xlabel("Carat")
plt.ylabel("Price")
plt.title("Diamonds")
plt.show()

Changing the Color: Option 2

Customization

plt.scatter (x=df.carat,y=df.price)
plt.xlabel ("Carat")
plt.ylabel ("Price")
plt.title ("Diamonds")
y_max=max(df.price)
x_max=df.carat[df.price==y_max]
plt.annotate ('Costliest Diamond', xy=(x_max,y_max), xytext=(3, 5),
    fontsize=15,arrowprops=dict(facecolor='black', shrink=0.05))
plt.show ()

We can use the annotate function to highlight a specific feature in the graph with an arrow

Customization

plt.scatter (x=df.carat,y=df.price)
plt.xlabel ("Carat")
plt.ylabel ("Price")
plt.title ("Diamonds")
plt.text(1.18, 2500, r'ln(y) = $b_0 + \sum_{j=1}^p b_j*x_j$', fontsize=18)
plt.show()

Matplotlib allows us to embed a mathematical formula with the plot

Customization

plt.scatter(df.carat,df.price,)
plt.xlabel("Carat")
plt.ylabel("Price")
plt.xticks(np.arrange(min(df.carat)+0.1, max(df.carat)+0.3, 0.5))
plt.yticks(np.arrange(500, max(df.price)+2000, 3000))
plt.title("Diamonds")
plt.show()

Get or set the x-limits and y-limits of the current tick locations and labels.

 

Customization

plt.scatter(df.carat,df.price,)
plt.xlabel("Carat")
plt.ylabel("Price")
plt.title("Diamonds")
plt.tight_layout()
plt.show()

Tight layout automatically adjusts the parameters, so that the plot fits the figure area

 

Customization

plt.scatter(df.carat,df.price,)
plt.xlabel("Carat")
plt.ylabel("Price")
plt.xlim(0,3)
plt.ylim(0,18000)
plt.title("Diamonds")
plt.show()

Xlim and Ylim automatically sets limits in y and x parameters.

 

Customization

plt.scatter(df.cut, df.price)
plt.scatter(df.cut, df.carat)
plt.ylabel("Price")
plt.xlabel("Cut")
plt.twinx()
plt.ylabel("Carat")
plt.title("Diamonds")
plt.show()

Create a twin Axes sharing the x-axis

 

Customization

plt.scatter(df.price, df.cut)
plt.scatter(df.carat, df.cut)
plt.ylabel("Cut")
plt.xlabel("Price")
plt.ylim(0,18000)
plt.twiny()
plt.xlim(0,1)
plt.show()

Create a twin Axes sharing the y-axis

 

Customization

plt.subplot(2,1,1);
plt.scatter(df.price, df.cut)
plt.scatter(df.carat, df.cut)
plt.ylabel("Cut")
plt.xlabel("Price")
plt.ylim(0,18000)
plt.twiny()
plt.ylabel("Cut")
plt.ylim(-1,5)

plt.subplot(2,1,2);
plt.scatter(df.cut, df.price)
plt.scatter(df.cut, df.carat)
plt.ylabel("Price")
plt.xlabel("Cut")
plt.twinx()
plt.ylabel("Carat")


plt.tight_layout()
plt.show()

Subplot(m,n,p) divides the current figure into an m-by-n grid and creates axes in the position specified by p. MATLAB® numbers subplot positions by row

 

Customization


plt.subplot(1,2,1);
plt.scatter(df.price, df.cut)
plt.scatter(df.carat, df.cut)
plt.ylabel("Cut")
plt.xlabel("Price")
plt.ylim(0,18000)
plt.twiny()
plt.ylabel("Cut")
plt.ylim(-1,5)

plt.subplot(1,2,2);
plt.scatter(df.cut, df.price)
plt.scatter(df.cut, df.carat)
plt.ylabel("Price")
plt.xlabel("Cut")
plt.twinx()
plt.ylabel("Carat")


plt.tight_layout()
plt.show()

In the previous subplot (2,1,1) and (2,1,2) while in this one (1,2,1) and (1,2,2)

diamonds.head()

import seaborn as sns

sns.stripplot(x='cut', y='price', data=diamonds)
plt.show()

Univariate plots - Stripplot

diamonds.head()

import seaborn as sns

sns.swarmplot(x='cyl', y='mpg', data=mtcars)
plt.show()

Spreads out points to prevent overplotting

note: very slow

Swarmplot

import seaborn as sns

sns.boxplot(x='cut', y='price', data=diamonds)
plt.show()

sns.violinplot(x='cut', y='price', data=diamonds)
plt.show()

Violinplot, alternative to boxplot that also shows frequency distribution

Univariate plots

Standard boxplot

import seaborn as sns

sns.jointplot(x='carat', y='price', data=diamonds)
plt.show()

plots continuous x and y variables against each other with correlation and histograms on both sides

Jointplot

import seaborn as sns

sns.pairplot(data=diamonds)
plt.show()

Plots each variable in the dataset against each other to quickly get an overview of the data

Pairplot

import seaborn as sns

diamonds2 = diamonds.drop(diamonds.columns[[1, 2, 3]], axis=1)
covars = diamonds2.corr()

sns.heatmap(covars)
plt.show()

First calculate covariances, the heatmap will display them visually

Heatmap 

import seaborn as sns

sns.regplot(x='carat', y='price', data=diamonds)
plt.show()

Simple way to plot linear model over a scatterplot

Statistical plots

Regplot (1/2)

import seaborn as sns

sns.regplot(x='carat', y='price', data=diamonds, order=2)
plt.show()

Add 'order' argument to fit different level polynomials over the data

Statistical plots

Regplot (2/2)

import seaborn as sns

sns.lmplot(x='carat', y='price', data=diamonds, hue='cut', scatter_kws={'alpha':0.1})
plt.show()

- Hue divides the data in different groups based on a factor variable

- scatter_kws={'alpha':0.1} sets the alpha of the scatter plot part of the lmplot()

Statistical plots

Lmplot 

Combination of sns.regplot() and facet grid. Allows you to set extra arguments like 'hue' 

import seaborn as sns

sns.residplot(x='carat', y='price', data=diamonds)
plt.show()

Statistical plots

Residplot 

Allows you to plot residuals of the relationship between different continuous variables

Matplotlib - Group E

By laurenstc

Matplotlib - Group E

  • 866