PyGotham 2018
Molly Leen
Molly Leen
Senior Software Engineer @
At Kyruus we help health systems match patients with the right providers and enhance patient access enterprise-wide
LETS ADD THIS LATER
"BIG" data
raw data
API server
POST
{
"first_name": "Molly",
"last_name": "Leen",
"conferences": ["PyGotham"]
}
ingestion script
INSERT
Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs significantly less overhead for large data loads.
From the PostgreSQL docs:
API server
ONE request
raw data
ingestion script
COPY
https://www.postgresql.org/docs/9.4/static/sql-copy.html
COPY reads from a file or file-like object which is formatted to match the structure of the table
...but this file would be very large....
...we don't want to have to download a large file to disk....
...lets define some requirements....
...and the structure of the file is very important....
Requirements:
Do not download file to disk
According to Google:
According to the python docs:
With a Pre-Signed s3 URL and python requests, we can iterate over the data line by line using a generator
Definitions:
with requests.get(url, stream=True) as data:
# Data is now a generator we can access either
# in a loop or by calling data.next()
We have a generator...
...We need a file-like object with read() and readline() methods...
...Lets build one!