Lazy Pandas
PyOhio 2013
Lightning Talk
Ron DuPlain
ron.duplain@gmail.com
photo credit: mine
Motivation
-
Preprocess some dataset into one or more tables.
-
Load the data into pandas.
-
... without aggressively consuming memory.
Too large to fit into memory?
- Use a query to pre-filter data.
- Watch for memory consumption in the middleware!
I have tried ...
-
Solr's CSV writer with pysolr,
direct urlopen, or requests.
- Direct SQLAlchemy query iteration.
... and ended up consuming 2GB of memory
from a 40MB CSV file on disk.
(SQLAlchemy is still awesome!)