By Luke Rosiak
NICAR 2014

PREVIOUS OPTIONS

what citizenaudit has

  • Ten years of PDFs for every filer, obtained monthly from the IRS on DVDs and posted online.
  • Four years of 990s are fully text-searchable. I did optical character recognition on more than 2 million images of paper forms, each between 5 and 1,000 pages.
  • Staff, board members, vendors, dollar amounts, addresses, grants, answers to questions, anything that’s in the form.
  • Easy-to-use structured data downloads for the "extracts" the IRS has released

Just search... like google.

  • Type anything in the box.
  • There's a lot of data, so use quotes if you want to match an exact phrase, or if there are numbers involved.
  • Matches on organization name, chief contact person, or mailing address appear in a table. Click the name of the organization to go to its profile page. Or if you want to jump straight to a PDF, click the year in the left column.
  • Below that are matches on OCR'd data, the internals of the 990. If you've searched the name of an organization, these results are generally grantees.
A TYPICAL SEARCH

WHO'S FUNDING AMERICANS FOR PROSPERITY? just ask.

Finding funders

  • Perhaps the most novel and important use for CitizenAudit is if you want to find out who's funding a nonprofit.
  • Nonprofits don't disclose who gives to them, but they do disclose who they give to.
  • It's always been easy to ask for an organization's 990 and read it, but to see who's funding it, you'd basically have to read the grantee section of every other nonprofit in existence's forms. You wouldn't know where to look.
  • That's what CitizenAudit does. It should find almost all nonprofit-to-nonprofit grants. This technique won't tell you what companies and individuals are funding nonprofits, unless they're routing it through trusts, etc.

click an org name to go to a profile page


OCR'd text is on there too. CTRL-F can be helpful


Formatting of OCR'd text isn't always very clear, but there are page numbers, so you can click to open the PDF and see the real page. OCR'd text serves as a guide telling you exactly what normal (PDF) docs to look at.


use it for routine backgrounding

  • 990s are thorough disclosures. It's a chance to get paper on someone, more than you'll see in limited forms like incorporation documents.
  • If you're backgrounding anyone who's been active in the community, there's a decent chance they'll show up. Get in the habit of doing it even when your story has nothing to do with tax-exempt organizations.
  • Nonprofits can be shadow/sister organizations to companies and other groups. They can sometimes have almost no real-world presence, but search an address or a person's name and they may have one there.

Addresses and salaries

  • Just because it's called a nonprofit doesn't mean people don't get enormously rich
  • Pulling a 990 the traditional way isn't enough because they could be getting paid by other, supposedly unrelated charities

Center for American Progress: Money in, money out

Before, you only had half the picture: money out. With CitizenAudit, you can piece together pass-throughs.

Tangled weBS

FOR SQL people and webdevs

PostgreSQL and CSV dumps:

  • Manifest: 10 years of links to PDFs and total assets
  • Master file+extracts: Structured data released by the IRS--a subset of fields for a subset of filers, 2012 only

API
  • Pass it an EIN, get back JSON: the structured data from the extracts and the OCR'd text.
  • You need a key, and you'd need to have mercy on my server.

Hardware
  • Hexa-core processor at 100% utilization 24/7 for OCRing
  • HOCR format with bounding boxes: 3TB
  • Elasticsearch w/ text only: 100GB+

$: i need it to keep this project going


Try it out

  • Questions?
  • Sample searches I can conduct for you right now?

Contact me
lrosiak@gmail.com
twitter: @lukerosiak
citizenaudit.org

Questions?

Want me to do a lookup for you right now?


Contact me at lrosiak@gmail.com
@lukerosiak
Made with Slides.com