Open Data
Cody Fullerton
Data & Social Science Librarian
16 February 2021
Open Data
Structured data that is machine-readable, freely shared, used, and built on without restrictions.
Open Data Movement
- Similar to other "Open" movements, like the Open Access Movement that is concerned with making scholarly publications freely available on the internet.
- Data as a public utility.
- Data should be, by default, open.
- Born from open science and open source movements.
Key Things to Remember
- Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
Key Things to Remember
- Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-use and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, 'non-commercial' restrictions that would prevent 'commercial' use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
Benefits of Open Data
Support for innovation - Access to data supports innovation in the private sector by reducing duplication and promoting reuse of existing resources.
Support for research - Access to federal research data supports evidence-based primary research in Canadian and international research communities.
Increasing government accountability - Increased access to government data and information provides the public with greater insight into government activities, service delivery, and use of tax dollars.
Topics of Interest to Canadians
Minimum Wage - Historical
Permanent Residency Rates
Temporary Residents - Study Permit Rates
Fuel Consumption Rates
National Hydro Network
Not all Data can be Open
Privacy: A dataset or information that contains personal information about an individual must not be released.
Security: Information or data that may pose security risks to the institution, to the government, or to vulnerable or targeted individuals or organizations must not be released.
Confidentiality: Information or data that impairs the government’s ability to make some decisions cannot be released.
Legacy information or data: Sometimes there is a substantive cost to making the resource eligible for release.
Not all Data can be Open
Legal and contractual limitations: A dataset may be subject to legal or contractual agreements that prevent it from being released. Agreements may include:
- limitations in data sharing agreements and memoranda of understanding
- third party data - government organizations that collect data for federal use, but where the federal organization may not have the rights to publish it on open.canada.ca
- commercial license - data purchased from third parties, which may have limited rights for distribution
- non-disclosure agreements
Not all Data can be Open
Ethical and cultural limitations: A data set may hold certain religious or sentimental value to its owners, who do not wish to disclose it, or have limitations on its disclosure.
For example, a recording of an Indigenous song that is only sung during fall harvest. The owners may only make this song available to the public during the time it is ritually sung. Platforms like Mukurtu can facilitate this type of timed disclosure.
Open Data Sources
Federal:
Provincial:
Municipal:
Open Data Sources
Open Data Sources
Other interesting sources:
Street Census - Gathers information about the extent and nature of homelessness in Winnipeg
Google Public Data Explorer - Groups together data sources for comparison and analysis. Easy to create visuals.
FiveThirtyEight - Great for data-driven journalism and storytelling. Provides data from politics, sports, science, economics, and beyond.
Dataverse
Useful Tools at UM
ODESI is a web-based data exploration, extraction, and analysis tool. Users can search for survey questions (variables) across thousands of datasets held in a growing number of collections, then tabulate and analyze results online.
A web-based exploration, extraction and analysis tool for social science data. The NESSTAR data portal consists of StatCan Public Use Microdata Files (PUMF) and StatCan metadata for Master Files (RDC).
Provides access to a variety of commonly requested Canadian Census data and documentation, produced by Statistics Canada.
Citing Data
- Just like research publications, data needs to be cited when it is used.
- Some citation styles do not have a specific method for citing data.
- More information on citing data.
Citing Data
In general, all data citations should include the following elements:
- Author or creator
- Year of publication and/or last update
- Title or description
- Publisher or distributor (database, repository, etc.)
- Unique identifier (URL or DOI)
- Any other elements the creator or publisher asks you to include
Questions?
Open Data Workshop
By codyfullerton
Open Data Workshop
- 82