6. getting, making, cleaning data sources
File formats
Tabular:
xls
xlsx
csv
json
xml
Geospatial:
kml
geojson
shp
geotiff
Image-based:
pdf
jpeg
search
download
convert
store, prepare
analyse
present
search
Local databases (company register databases, Opten)
National databases (KSH)
Multi-national databases (Eurostat, OECD)
Global databases (Ourworldindata)
Special databases (Freedom House, Transparency, RSF, indexes, Átlátszó)
download
ctrl+C ctrl+V:
http://www.johnstonsarchive.net/nuclear/tests/index.html
Google Sheets formulas (importHTML, importXML)
https://devhints.io/xpath
Google Chrome plugin “Scraper”:
https://chrome.google.com/.../mbigbapnjcgaffohmbkdlecacce...
https://www.imdb.com/chart/top/
python scraper imdb
convert
Tabula: https://tabula.technology (CSV from PDF)
https://www.onlineocr.net/pdftoexcel (OCR (Optical Character Recognition), image-based PDF to tabular data)
Thank you for your attention!
szabo.krisztian96@gmail.com