Team POPONG

DB History




Feb 4th, 2014
Team POPONG

lucypark@popong.com
steel@popong.com




The Past

The birth of Team POPONG

  • A couple students gathered in Dec 2011, to build a service for easier comparisons of politicians for the National Assembly election in Apr 2012
  • This became Team POPONG

The Problem

  • However, the process was very time consuming because:
    • No structured data available (only on the web)
    • No APIs
    • No through knowledge on what data the government had
      • we were all engineers 
  • In the end, we couldn't come up with a service as we desired



  • The problem
    • No data available for service development
  • The solution
    • A biographical dictionary in order to build a comparison service, four years later (when the next National Assembly election places)

Mongo DB

  • Constructed from National Assembly election records
    • One person collection
  • NoSQL because we didn't know the better structure for data
  • (We wanted to try it out) 
  • Example:
  • {
      "birthday": "04",
      "education": "서울대 법대 4년 졸업",
      "assembly": {
        "10": {
          "education": "미국미조리주립대학국제정치학사",
          "district": "서울 제1선거구",
          "birthyear": "1944",
          "name_kr": "정대철",
          "voterate": "42.50",
          "name_cn": "鄭大哲",
          "experience": [
            "한양대학전임강사",
            "재미한민신보논설위원"
          ],
          "sex": "남",
          "birthmonth": "01",
          "job": "국회의원",
          "birthday": "04",
          "elected": true,
          "assembly_no": 10,
          "party": "신민당",
          "votenum": "105983"
        },
        "13": {
          "votenum": "38228",
          "district": "서울 중구(서울)",
          "birthmonth": "01",
          "name_kr": "정대철",
          "voterate": "39.60",
          "name_cn": "鄭大哲",
          "experience": [
            "9,10대국회의원",
            "평화민주당대변인"
          ],
          "sex": "남",
          "birthyear": "1944",
          "job": "정당인",
          "birthday": "04",
          "elected": true,
          "party": "평화민주당",
          "assembly_no": 13,
          "education": "미조리주립대졸업"
        },
      },
      "votenum": "29414",
      "district": "서울 중구(서울)",
      "birthmonth": "01",
      "name_kr": "정대철",
      "voterate": "49.10",
      "name_cn": "鄭大哲",
      "experience": [
        "한국야구위원회(KBO) 총재",
        "새정치국민회의 부총재"
      ],
      "sex": "남",
      "birthyear": "1944",
      "job": "정당인",
      "cand_no": "2",
      "elected": true,
      "party": "새천년민주당",
      "assembly_no": 16,
      "_id": {
        "$oid": "504c456610594c45ed000001"
      },
      "id": 3
    }




The Present

PostgreSQL + SQLAlchemy


  • Background
    • Recognized the importance of bills and parties
    • biographical dictionary → politicians + bills + parties
      • Either bills, parties collections had to be newly added
      • Or the data had to be added to the person collection
      • which were both very bad decisions
    • Switching to RDBs was not an option
      • In need of joins, indices, constraints
    • Plus, we got to know more about the gov data, which made it possible to structurize
  • (The initial) Tables
    • person, party, election, candidacy, party_affiliation             

Changes

  • Deleted party_affiliation
    • Because it was duplicate with  candidacy
    • Which later turned out to be a bad choice because of party migrations
  • Added bill, cosponsorship
  • Added keyword, bill_keyword




The Future

Background


  • Handle party migrations
    • candidacy was not enough
  • Handle by-elections, dismissals, resignations...
  • Handle various election types
    • Not only the National Assembly but also the Presidential, municipal, ....
  • Use SQLAlchemy's advanced features
    • polymorphic identities
    • triggers

Planned Schema




The API

Design

  • Goal: Minimize API development costs
    • Team POPONG's main goals is automation
    • We're always short on man-hours
  • Action: Auto-comile API interfaces from existing DB models


  • For different naming conventions for parameters
    • Consulted Github API v3
      • ex:  page, per_page instead of offset, limit
      • But offset, limit seem more appropriate for APIs, so planning to change in v1.0
    Made with Slides.com