INFOGRAPHICS

(BN-INFGRA-04-GY)

6. getting, making, cleaning data sources

File formats

 

Tabular:

xls
xlsx
csv
json
xml

 

Geospatial:
kml
geojson
shp
geotiff

 

Image-based:
pdf

jpeg

search

download

convert

store, prepare

analyse

present

search

 

Local databases (company register databases, Opten)

National databases (KSH)

Multi-national databases (Eurostat, OECD)

Global databases (Ourworldindata)

Special databases (Freedom House, Transparency, RSF, indexes, Átlátszó)

 

 

 

download

 

ctrl+C ctrl+V:

http://www.johnstonsarchive.net/nuclear/tests/index.html

 

 

Google Sheets formulas (importHTML, importXML)

https://devhints.io/xpath

 

Google Chrome plugin “Scraper”:

https://chrome.google.com/.../mbigbapnjcgaffohmbkdlecacce...

https://www.imdb.com/chart/top/

 

python scraper imdb

convert

 

Tabula: https://tabula.technology (CSV from PDF)

https://www.onlineocr.net/pdftoexcel (OCR (Optical Character Recognition), image-based PDF to tabular data)

Thank you for your attention!

szabo.krisztian96@gmail.com

METU6

By Szabó Krisztián

METU6

  • 255