引力波数据探索：编程与分析实战训练营

第 2 部分基于 Python 的数据分析基础

主讲老师：王赫

2023/12/03

ICTP-AP, UCAS

数据分析实训之 Pandas

数据分析实训之 Pandas

# Pandas

Pandas 是基于 Numpy 的强大的分析结构化数据的工具集
- Pandas 官方网址：https://pandas.pydata.org
- Pandas 官方中文文档：https://www.pypandas.cn
从 Numpy 的 Ndarray 到 Pandas 的 Series / DataFrame
- (Numpy) 1-dimensional array $\Leftrightarrow$ Series (Pandas)
- (Numpy) 2-dimensional array $\Leftrightarrow$ DataFrame (Pandas)

df.loc[[58, 14],
       ['col3', 'col4', 'col5']]

a[[2,4],:3]

$\Leftrightarrow$

# Pandas

# Pandas

Series / DataFrame 常用的属性与方法

shape / size / index / dtype / astype() / ...

head() / tail() / describe() / values

to_*() / sort_index() / sort_values() / ...

apply() / drop() / drop_duplicates() / ...

isin() / isna() / isnull() / fillna() / ...

$\Rightarrow$ ndarray (Numpy)

Series 的常用属性与方法

	pivoted = pd.pivot_table(s4g, index=['Symbol', 'Year'],
	values=['Open','Close'], aggfunc='mean',
	columns=['Month'], fill_value = 0)

	table = s4g.groupby(['Symbol', 'Year', 'Month'])['Open', 'Close'].mean()
	table = table.unstack('Month')
	table = table.fillna(0)