Pandas: The person in the middle
Jeff Reback
Continuum Analytics
September 19 2015
DS4DS
Duck Dtypes
- Categorical
- Datetime with Time zone
- Missing Values
- Units
In [78]: s = Series(list('aabbcab'),dtype='category')
Out[78]: s
0 a
1 a
2 b
3 b
4 c
5 a
6 b
dtype: category
Categories (3, object): [a, b, c]
In [79]: s.cat.codes
Out[79]:
0 0
1 0
2 1
3 1
4 2
5 0
6 1
dtype: int8
In [80]: s.cat.categories
Out[80]: Index([u'a', u'b', u'c'], dtype='object')
In [81]: s.cat.ordered
Out[81]: False
Categorical Dtype
In [47]: s.dtype
Out[47]: category
In [48]: type(s.dtype)
Out[48]: pandas.core.dtypes.CategoricalDtype
In [82]: s.cat.as_ordered()
Out[82]:
0 a
1 a
2 b
3 b
4 c
5 a
6 b
dtype: category
Categories (3, object): [a < b < c]
In [51]:
# category[a, b, c]
# category[a < b < c]
Datetime with Timezones
In [52]: s = pd.Series(pd.date_range('20130101',periods=3,tz='US/Eastern'))
Out[52]: s
0 2013-01-01 00:00:00-05:00
1 2013-01-02 00:00:00-05:00
2 2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [53]: s.values
Out[53]:
array(['2012-12-31T21:00:00.000000000-0800',
'2013-01-01T21:00:00.000000000-0800',
'2013-01-02T21:00:00.000000000-0800'], dtype='datetime64[ns]')
In [54]: s.dt.tz
Out[54]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
In [55]: s.dtype
Out[55]: datetime64[ns, US/Eastern]
In [56]: type(s.dtype)
Out[56]: pandas.core.dtypes.DatetimeTZDtype
In [57]: from pandas.core.dtypes import DatetimeTZDtype
dtype = DatetimeTZDtype('ns','CET')
dtype
Out[57]: datetime64[ns, CET]
In [58]: dtype.tz
Out[58]: 'CET'
DyND integration into pandas
In [59]: pd.get_option('support.dynd')
Out[59]: True
In [60]: pd.set_option('support.dynd',False)
In [61]: s = Series([1,2,3])
Out[61]: s
0 1
1 2
2 3
dtype: int64
In [62]: s.dtype
Out[62]: dtype('int64')
Current
In [63]: s[s.notnull()]
Out[63]:
0 1
1 2
2 3
dtype: int64
In [64]: s+1
Out[64]:
0 2
1 3
2 4
dtype: int64
In [65]: s[1]
Out[65]: 2
In [66]: s[1] = np.nan
In [67]: s
Out[67]:
0 1
1 NaN
2 3
dtype: float64
In [68]: s[s.notnull()]
Out[68]:
0 1
2 3
dtype: float64
In [69]: pd.set_option('support.dynd',True)
In [70]: s = Series([1,np.nan,3])
Out[70]: s
0 1
1 NaN
2 3
dtype: int64
In [71]: s.dtype
Out[71]: ndt.type("?int64")
In [72]: s.values
Out[72]: nd.array([ 1, NA, 3],
type="3 * ?int64")
Using DyND
In [73]: s[s.notnull()]
Out[73]:
0 1
2 3
dtype: int64
In [74]: s=s+1
Out[74]: s
0 2
1 NaN
2 4
dtype: int64
In [75]: s[1]
Out[75]: nan
In [76]: s[2] = np.nan
Out[76]: s
0 2
1 NaN
2 NaN
dtype: int64
Thanks
@jreback
DS4DS Pandas Talk
By Jeff Reback
DS4DS Pandas Talk
DS4DS talk at the BIDS conf on Sept 19th, 2015.
- 2,056