Pandas: The person in the middle

Jeff Reback

Continuum Analytics

September 19 2015

DS4DS

 

Duck Dtypes

 
  • Categorical
  • Datetime with Time zone
  • Missing Values
  • Units
In [78]: s = Series(list('aabbcab'),dtype='category')
Out[78]: s
0    a
1    a
2    b
3    b
4    c
5    a
6    b
dtype: category
Categories (3, object): [a, b, c]

In [79]: s.cat.codes
Out[79]:
0   0
1   0
2   1
3   1
4   2
5   0
6   1
dtype: int8

In [80]: s.cat.categories
Out[80]: Index([u'a', u'b', u'c'], dtype='object')

In [81]: s.cat.ordered
Out[81]: False

Categorical Dtype

In [47]: s.dtype
Out[47]: category

In [48]: type(s.dtype)
Out[48]: pandas.core.dtypes.CategoricalDtype

In [82]: s.cat.as_ordered()
Out[82]:
0    a
1    a
2    b
3    b
4    c
5    a
6    b
dtype: category
Categories (3, object): [a < b < c]

In [51]:
# category[a, b, c]
# category[a < b < c]

Datetime with Timezones

In [52]: s = pd.Series(pd.date_range('20130101',periods=3,tz='US/Eastern'))

Out[52]: s
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]

In [53]: s.values
Out[53]:
array(['2012-12-31T21:00:00.000000000-0800',
       '2013-01-01T21:00:00.000000000-0800',
       '2013-01-02T21:00:00.000000000-0800'], dtype='datetime64[ns]')

In [54]: s.dt.tz
Out[54]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
In [55]: s.dtype
Out[55]: datetime64[ns, US/Eastern]

In [56]: type(s.dtype)
Out[56]: pandas.core.dtypes.DatetimeTZDtype

In [57]: from pandas.core.dtypes import DatetimeTZDtype
         dtype = DatetimeTZDtype('ns','CET')
         dtype
Out[57]: datetime64[ns, CET]

In [58]: dtype.tz
Out[58]: 'CET'
 

DyND integration into pandas


In [59]: pd.get_option('support.dynd')
Out[59]: True

In [60]: pd.set_option('support.dynd',False)

In [61]: s = Series([1,2,3])
Out[61]: s
0    1
1    2
2    3
dtype: int64

In [62]: s.dtype
Out[62]: dtype('int64')

Current

In [63]: s[s.notnull()]
Out[63]:
0    1
1    2
2    3
dtype: int64

In [64]: s+1
Out[64]:
0    2
1    3
2    4
dtype: int64
In [65]: s[1]
Out[65]: 2

In [66]: s[1] = np.nan

In [67]: s
Out[67]:
0     1
1   NaN
2     3
dtype: float64

In [68]: s[s.notnull()]
Out[68]:
0    1
2    3
dtype: float64

In [69]: pd.set_option('support.dynd',True)

In [70]: s = Series([1,np.nan,3])
Out[70]: s
0     1
1   NaN
2     3
dtype: int64

In [71]: s.dtype
Out[71]: ndt.type("?int64")

In [72]: s.values
Out[72]: nd.array([ 1, NA,  3],
                  type="3 * ?int64")

Using DyND

In [73]: s[s.notnull()]
Out[73]:
0   1
2   3
dtype: int64

In [74]: s=s+1
Out[74]: s
0     2
1   NaN
2     4
dtype: int64
In [75]: s[1]
Out[75]: nan

In [76]: s[2] = np.nan
Out[76]: s
0     2
1   NaN
2   NaN
dtype: int64
 

Thanks

@jreback

DS4DS Pandas Talk

By Jeff Reback

DS4DS Pandas Talk

DS4DS talk at the BIDS conf on Sept 19th, 2015.

  • 2,032