Open Data from command line

Sono un geomatico
 Mi occupo rilevamento e trattamento informatico
dei dati relativi alla terra e all’ambiente.

mappa: Stamen design

Open Data Sicilia

Un'iniziativa civica che si propone di far conoscere e diffondere la cultura dell'open government e le prassi dell'open data nel nostro territorio e aprire una discussione pubblica partecipata.

http://opendatasicilia.it

Open Data

Open data is publicly available data that can be universally and readily accessed, used, and redistributed free of charge. It is structured for usability and computability.

Principi

  • Public. Consistent with OMB’s Open Government Directive, agencies must adopt a presumption in favor of openness to the extent permitted by law and subject to privacy, confidentiality, security, or other valid restrictions.
  • Accessible. Open data are made available in convenient, modifiable, and open formats that can be retrieved, downloaded, indexed, and searched. Formats should be machine-readable. Open data structures do not discriminate against any person or group of persons and should be made available to the widest range of users for the widest range of purposes, often by providing the data in multiple formats for consumption. To the extent permitted by law, these formats should be non-proprietary, publicly available, and no restrictions should be placed upon their use.

Project Open Data: https://project-open-data.cio.gov/principles/

 

Principi

  • Described. Open data are described fully so that consumers of the data have sufficient information to understand their strengths, weaknesses, analytical limitations, security requirements, as well as how to process them. This involves the use of robust, granular metadata (i.e., fields or elements that describe data), thorough documentation of data elements, data dictionaries, and, if applicable, additional descriptions of the purpose of the collection, the population of interest, the characteristics of the sample, and the method of data collection.
  • Reusable. Open data are made available under an open license that places no restrictions on their use.

Project Open Data: https://project-open-data.cio.gov/principles/

 

Principi

  • Complete. Open data are published in primary forms (i.e., as collected at the source), with the finest possible level of granularity that is practicable and permitted by law and other requirements. Derived or aggregate open data should also be published but must reference the primary data.

    Timely. Open data are made available as quickly as necessary to preserve the value of the data. Frequency of release should account for key audiences and downstream needs.

  • Managed Post-Release. A point of contact must be designated to assist with data use and to respond to complaints about adherence to these open data requirements.

Project Open Data: https://project-open-data.cio.gov/principles/

 

Andiamo a "commandare"

Fare cose nuove

con strumenti del 1960?

foto da Wikipedia

La "Data science" è OSEMN

(si pronuncia awesome)

OSEMN

  • Obtaining
  • Scrubbing
  • Exploring
  • Modeling
  • iNterpreting data

La command line

  • è Agile
  • è Scalabile
  • è Estendibile
  • è Ubiquitaria

5 tipi di tool

  • A binary executable
  • A shell builtin
  • An interpreted script
  • A shell function
  • An alias

on the job

Download

curl -s "https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv" > artworks.csv

 

Check e visualizzazione

head -n 3 artworks.csv
head -n 3 artworks.csv | csvlook

 

Filtri "orizzontali"

head -n 3 artworks.csv | csvcut  -c 1,3 | csvlook

csvlook -n artworks.csv

csvcut -c Artist,ArtistBio,Date

 

"Analisi"

csvstat artworks.csv

 

SQL

csvsql --query "select count(*) AS 'numero', Artist from '10000' where Artist like 'Bernard Tschumi' group by Artist" 10000.csv

Scraping (andarseli a prendere)

Strumenti

  • Scrape
  • xml2json
  • jq
  • csvkit

Nozioni di base

  • XPath
  • Standard Input/Standard output
  • Shell

Buone letture


Andrea Borruso
@aborruso | aborruso@gmail.com

Open Data from command line

By andreaborruso

Open Data from command line

Un intervento per il Linux Day del 2016 - 22/10/2016, Partanna

  • 1,810