Knowledge Transfer

Big Data/Web-scraping Project at ECLAC, UN

 

What we implemented

  • Web-scraper
  • Databases
  • A repository website (ds-d1.cepal.org)

The code base is available on my Github private repository.

 

Please share to me your Github account.

Infrastructure

Database

Multi-threaded Crawler

Content Parser

Basic Data Cleaning

Normalization

Data File Output

Visualization

Web-scraper

Error Handling & Retry

De-dupe

Aggregation

Tech. Specifications

  • OS: CentOS 7
  • Databases
    • MariaDB 10 (MySQL)
    • MongoDB
  • Programming: Python 3.6 or higher
  • Web:
    • NGINX
    • HTML, CSS, JavaScript
    • PHP 7
    • Python 3.6 or higher

Our Server

  • Address: ds-d1.cepal.org
  • Only the staff network is able to access
  • Implemented a repo. website for data sharing (internal usage only)

Our Server

  • The website provides data access
    • Static file downloads (implemented in Python)
    • APIs (implemented in PHP and Python)
  • The code of the API is also available at my Github repo.

UN Big Data Project Knowledge Transfer

By Yu-Chang Ho

UN Big Data Project Knowledge Transfer

Slides for knowledge transfer within ECLAC, United Nations. Knowledge regarding the technical aspect.

  • 121