Create an account
photo credit: mine
Preprocess some dataset into one or more tables.
Load the data into pandas.
... without aggressively consuming memory.
Too large to fit into memory?
Use a query to pre-filter data.
Watch for memory consumption in the middleware!
I have tried ...
Solr's CSV writer with pysolr,
direct urlopen, or requests.
Direct SQLAlchemy query iteration.
... and ended up consuming 2GB of memory
from a 40MB CSV file on disk.
(SQLAlchemy is still awesome!)
By Ron DuPlain
Create a presentation like this
An experiment in lazily loading data into a Python pandas v0.12.0 DataFrame.