Open Data

 

 

Cody Fullerton

Data & Social Science Librarian

16 February 2021

Open Data 

Structured data that is machine-readable, freely shared, used, and built on without restrictions.

Open Data Movement

  • Similar to other "Open" movements, like the Open Access Movement that is concerned with making scholarly publications freely available on the internet.
  • Data as a public utility.
  • Data should be, by default, open.
  • Born from open science and open source movements. 

Key Things to Remember

  • Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.

Key Things to Remember

 

  • Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
  • Universal participation: everyone must be able to use, re-use and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, 'non-commercial' restrictions that would prevent 'commercial' use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Benefits of Open Data

Support for innovation - Access to data supports innovation in the private sector by reducing duplication and promoting reuse of existing resources.

Support for research - Access to federal research data supports evidence-based primary research in Canadian and international research communities.

Increasing government accountability - Increased access to government data and information provides the public with greater insight into government activities, service delivery, and use of tax dollars.

Topics of Interest to Canadians

Minimum Wage - Historical

Permanent Residency Rates

Temporary Residents - Study Permit Rates

Fuel Consumption Rates

National Hydro Network

National Broadband Data

Not all Data can be Open

Privacy: A dataset or information that contains personal information about an individual must not be released. 

Security: Information or data that may pose security risks to the institution, to the government, or to vulnerable or targeted individuals or organizations must not be released. 

Confidentiality: Information or data that impairs the government’s ability to make some decisions cannot be released.

Legacy information or data: Sometimes there is a substantive cost to making the resource eligible for release.

Not all Data can be Open

Legal and contractual limitations: A dataset may be subject to legal or contractual agreements that prevent it from being released. Agreements may include:

  • limitations in data sharing agreements and memoranda of understanding
  • third party data - government organizations that collect data for federal use, but where the federal organization may not have the rights to publish it on open.canada.ca
  • commercial license - data purchased from third parties, which may have limited rights for distribution
  • non-disclosure agreements

Not all Data can be Open

Ethical and cultural limitations: A data set may hold certain religious or sentimental value to its owners, who do not wish to disclose it, or have limitations on its disclosure.

For example, a recording of an Indigenous song that is only sung during fall harvest. The owners may only make this song available to the public during the time it is ritually sung. Platforms like Mukurtu can facilitate this type of timed disclosure.

Open Data Sources

Federal:

Provincial:

Municipal:

Open Data Sources

Open Data Sources

Other interesting sources:

Street Census - Gathers information about the extent and nature of homelessness in Winnipeg

Google Public Data Explorer - Groups together data sources for comparison and analysis. Easy to create visuals.

FiveThirtyEight - Great for data-driven journalism and storytelling. Provides data from politics, sports, science, economics, and beyond.

 

Dataverse

Open Source Research Data Repository Software

 

The Dataverse Project

 

University of Manitoba Dataverse

 

Useful Tools at UM

Odesi

ODESI is a web-based data exploration, extraction, and analysis tool. Users can search for survey questions (variables) across thousands of datasets held in a growing number of collections, then tabulate and analyze results online.

Nesstar

A web-based exploration, extraction and analysis tool for social science data. The NESSTAR data portal consists of StatCan Public Use Microdata Files (PUMF) and StatCan metadata for Master Files (RDC).

Canadian Census Analyzer

Provides access to a variety of commonly requested Canadian Census data and documentation, produced by Statistics Canada.

Citing Data

  • Just like research publications, data needs to be cited when it is used.
  • Some citation styles do not have a specific method for citing data.
  • More information on citing data.

Citing Data

In general, all data citations should include the following elements:

  • Author or creator
  • Year of publication and/or last update
  • Title or description
  • Publisher or distributor (database, repository, etc.)
  • Unique identifier (URL or DOI)
  • Any other elements the creator or publisher asks you to include

Questions?

Open Data Workshop

By codyfullerton