Using Census Data
in Research
An Introduction to Statistics Canada
Census Data
Data & Social Science Librarian
University of Manitoba Libraries
Overview:
- Data Liberation Initiative
- Geography Terminology
- Statistics Canada Data Portal
- Microdata
- <odesi> & Nesstar
- Beyond 20/20 Demonstration
Data Liberation Initiative (DLI)
- The DLI is a partnership between post-secondary institutions and StatCan for improving access to Canadian data resources.
- The program began as a way for Canadian universities to collectively purchase StatCan files for use by researchers.
- It has evolved into training and support for university data librarians so they can best serve their communities.
- As members, we have free access to PUMFs, specialized aggregate tables, and PCCF.
Geography Terminology
Census Metropolitan Area & Census Agglomeration (CMA/CA)
Area consisting of one or more neighbouring municipalities situated around a core. A census metropolitan area must have a total population of at least 100,000 of which 50,000 or more live in the core. A census agglomeration must have a core population of at least 10,000.
Census Tract
Census tracts (CTs) are small, relatively stable geographic areas that usually have a population between 2,500 and 8,000 persons. They are located in census metropolitan areas and in census agglomerations that had a core population of 50,000 or more in the previous census.
Census Subdivision (CSD)
Area that is a municipality or an area that is deemed to be equivalent to a municipality for statistical reporting purposes (e.g., as an Indigenous reservations or an unorganized territory). Municipal status is defined by laws in effect in each province and territory in Canada.
Dissemination Area (DA)
Small area composed of one or more neighbouring blocks, with a population of 400 to 700 persons. All of Canada is divided into dissemination areas.
Dissemination Block (DB)
An area equivalent to a city block bounded by intersecting streets. These areas cover all of Canada.
Postal Codes
A quick note on Postal Codes:
- They are not typically used as census geography
- Sometimes data is sorted by Forward Sortation Area (FSA), the first three digits of a postal code
- If a researcher needs data by postal code they need to use the Postal Code Conversion File (PCCF)
Census vs National Household Survey (NHS)
- In 2011 the long form census was made optional and replaced by the mandatory short-form NHS
- Data that the 2011 Census is accurate, but the short form survey includes less variables and as a result, is much less useful to the government and researchers.
Statistics Canada Data Portal
Find everything from federal census data, to provincial data, to local district data.
What is Microdata?
In the study of survey and census data, microdata is information at the level of individual respondents.
Advantages:
Census results are most commonly published as aggregates both for privacy reasons and because of the large quantities of data involved; microdata for one census can easily contain millions of records, each with several dozen data items.
Summarizing results to an aggregate level results in information loss. For instance, if statistics for education and employment are aggregated separately, they cannot be used to explore a relationship between them. Access to microdata allows researchers much more freedom to investigate such interactions and perform detailed analysis.
Disadvantages:
Microdata analysis requires a well developed understanding of statistics and the software that you're using.
Common software choices for analyzing microdata:
- SAS
- STATA
- SPSS
- Excel
- Beyond 20/20
<odesi>
<odesi> (Ontario Data Documentation, Extraction Service and Infrastructure) is a digital repository for social science data, including polling data. It is a web-based data exploration, extraction and analysis tool. It provides researchers the ability to search for variables across thousands of datasets. There are both microdata and aggregate data available, in a range of formats.
Nesstar &RDS
Nesstar is a web-based exploration, extraction and analysis tool for social science data. The NESSTAR data portal consists of Public Use Microdata Files (PUMF) and Master Files (RDC).
Rich Data Services (RDS) is an analytical platform for PUMFs and their metadata. The RDS Explorer and Tabulation Engine's interfaces allow users to browse, interact, and download data and metadata for online or offline analysis.
Beyond 20/20
Demo & Activity
Questions?
Using Data in Census Research
By codyfullerton
Using Data in Census Research
- 93