ayush1997
a skill to be mastered......
CS Sophomore
Pythonista !
Hackathon Lover
FOSS Enthusiast
Mentor @ DevSocMSIT
What is it?
Pandas is a powerful data analysis toolkit providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easily and intuitively.
Pandas = Python + Numpy + R
Why Pandas?
Highly optimized for performance, with critical code paths written in Cython or C.
Easy handling of missing data (represented as NaN)
Robust IO tools for loading/saving data from/to different formats(CSV,HDF5,JSON.....)
Intuitive merging and joining of data sets
Easy label-based slicing, indexing, and subsetting of large data sets
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets
Combined with the excellent IPython toolkit and other libraries
Using Pip
pip install pandas
Using Conda
conda install pandas
import pandas as pd
It is a tablular data structure comprised of rows and columns.
A Series is a one-dimensional object similar to an array, list, or column in a table.
The pandas I/O API is a set of top level reader and writer functions that generally return a pandas object.
Head and Tail
Columns and indexes
Descriptive Statistic
Data Summary
Deleting
Rename
Unique
Different selection methods
pandas provides a suite of methods in order to get integer and label based indexing. The semantics follow closely python and numpy slicing
Selection by Label
Selection by Position
Boolean vector to filter data
| fo r or & for and ~ for not
In pandas the missing data is represented by NaN
Check for Null values
Filling Missing Data
Droping Data
By “group by” we are referring to a process involving one or more of the following steps
df.groupby(["Pclass"])
Once GroupBy objects have been created we can compute a summary statistic (or statistics) about each group
Once GroupBy objects have been created we can compute a summary statistic (or statistics) about each group
The concat function performs concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
df1
df2
concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False)
Concatenating Using append( )
A useful shortcut to concat are the append instance methods on Series and DataFrame.
They concatenate along axis=0
Text
To add a row
pandas has full-featured, high performance in-memory join/merge operations idiomatically very similar to relational databases like SQL
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True,
suffixes=('_x', '_y'), copy=True, indicator=False)
df1
df2
Text
ayush0016
ayush1997
ayushkumar97
https://github.com/ayush1997/Pandas-Tutorial