A Series of Unfortunate Kludges

Kludges

What is LOCKSS?

distributed digital preservation for published journals

Does LOCKSS meet our Digital Preservation needs?

To answer that we'd need to know what our Digital Preservation needs are.

When will we have "all the things" downloaded to LOCKSS?

Well, that's really hard to answer.[1]

1. The answer is "never."

It might be helpful to develop a way to measure the difference between what we have collected and what we have yet to collect.

Here's one way:

Request the LOCKSS status page from the web. Parse out the numbers (collected and yet to be collected)
Dump those into a Google spreadsheet
Have columns in the spreadsheet that can use the current numbers along with previous numbers and date/time stamps to calculate the average time to collection so far, and multiply that average by the number yet to be collected, resulting in a guess as to how long it would take us to get caught up.
Profit!!!

Request the LOCKSS status page from the web. Parse out the numbers (collected and yet to be collected)
(Python has a library for Google spreadsheets AND a library for requesting websites. Close enough!)
Dump those into a Google spreadsheet
(Neat, a google api can be used to do this and there's a Python library that supports it.)
Have columns in the spreadsheet that can use the current numbers along with previous numbers and date/time stamps to calculate the average time to collection so far, and multiply that average by the number yet to be collected, resulting in a guess as to how long it would take us to get caught up.
(Let's do this part first, it's easy. In Google Spreadsheets so preservation folks can see it!)
Profit!!!

5. Wait to see which piece breaks first.

The google api "broke" by requiring users to do OAuth2 instead of username/password.

Now what do I do?

Need to automatically and regularly capture the data and do the calculations, and put the results someplace where it's easy to share.

I can probably figure out OAuth2, but maybe I shouldn't?