Lazy Pandas

PyOhio 2013
Lightning Talk

Ron DuPlain

photo credit: mine


  1. Preprocess some dataset into one or more tables.
  2. Load the data into pandas.
  3. ... without aggressively consuming memory.

Too large to fit into memory?

  1. Use a query to pre-filter data.
  2. Watch for memory consumption in the middleware!


I have tried ...

  • Solr's CSV writer with pysolr,
    direct urlopen, or requests.
  • Direct SQLAlchemy query iteration.

... and ended up consuming 2GB of memory
from a 40MB CSV file on disk.

(SQLAlchemy is still awesome!)


Lazy Pandas

By Ron DuPlain

Lazy Pandas

An experiment in lazily loading data into a Python pandas v0.12.0 DataFrame.

  • 2,310
Loading comments...

More from Ron DuPlain