Open Data from command line
Sono un geomatico
Mi occupo rilevamento e trattamento informatico
dei dati relativi alla terra e all’ambiente.
mappa: Stamen design
Open Data Sicilia
Un'iniziativa civica che si propone di far conoscere e diffondere la cultura dell'open government e le prassi dell'open data nel nostro territorio e aprire una discussione pubblica partecipata.
Open Data
Open data is publicly available data that can be universally and readily accessed, used, and redistributed free of charge. It is structured for usability and computability.
Principi
- Public. Consistent with OMB’s Open Government Directive, agencies must adopt a presumption in favor of openness to the extent permitted by law and subject to privacy, confidentiality, security, or other valid restrictions.
- Accessible. Open data are made available in convenient, modifiable, and open formats that can be retrieved, downloaded, indexed, and searched. Formats should be machine-readable. Open data structures do not discriminate against any person or group of persons and should be made available to the widest range of users for the widest range of purposes, often by providing the data in multiple formats for consumption. To the extent permitted by law, these formats should be non-proprietary, publicly available, and no restrictions should be placed upon their use.
Project Open Data: https://project-open-data.cio.gov/principles/
Principi
- Described. Open data are described fully so that consumers of the data have sufficient information to understand their strengths, weaknesses, analytical limitations, security requirements, as well as how to process them. This involves the use of robust, granular metadata (i.e., fields or elements that describe data), thorough documentation of data elements, data dictionaries, and, if applicable, additional descriptions of the purpose of the collection, the population of interest, the characteristics of the sample, and the method of data collection.
- Reusable. Open data are made available under an open license that places no restrictions on their use.
Project Open Data: https://project-open-data.cio.gov/principles/
Principi
-
Complete. Open data are published in primary forms (i.e., as collected at the source), with the finest possible level of granularity that is practicable and permitted by law and other requirements. Derived or aggregate open data should also be published but must reference the primary data.
Timely. Open data are made available as quickly as necessary to preserve the value of the data. Frequency of release should account for key audiences and downstream needs.
-
Managed Post-Release. A point of contact must be designated to assist with data use and to respond to complaints about adherence to these open data requirements.
Project Open Data: https://project-open-data.cio.gov/principles/
Andiamo a "commandare"
Fare cose nuove
con strumenti del 1960?
foto da Wikipedia
La "Data science" è OSEMN
(si pronuncia awesome)
OSEMN
- Obtaining
- Scrubbing
- Exploring
- Modeling
- iNterpreting data
La command line
- è Agile
- è Scalabile
- è Estendibile
- è Ubiquitaria
5 tipi di tool
- A binary executable
- A shell builtin
- An interpreted script
- A shell function
- An alias
on the job
Download
curl -s "https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv" > artworks.csv
Check e visualizzazione
head -n 3 artworks.csv
head -n 3 artworks.csv | csvlook
Filtri "orizzontali"
head -n 3 artworks.csv | csvcut -c 1,3 | csvlook
csvlook -n artworks.csv
csvcut -c Artist,ArtistBio,Date
"Analisi"
csvstat artworks.csv
SQL
csvsql --query "select count(*) AS 'numero', Artist from '10000' where Artist like 'Bernard Tschumi' group by Artist" 10000.csv
Scraping (andarseli a prendere)
Strumenti
- Scrape
- xml2json
- jq
- csvkit
Nozioni di base
- XPath
- Standard Input/Standard output
- Shell
Buone letture
Andrea Borruso
@aborruso | aborruso@gmail.com
Open Data from command line
By andreaborruso
Open Data from command line
Un intervento per il Linux Day del 2016 - 22/10/2016, Partanna
- 1,894