Data Methods: Survey

Data Methods:
Survey

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

What is survey?

American Statistical Society (ASA)

 

”Survey" is used most often to describe a method of gathering information from a sample of individuals. This "sample" is usually just a fraction of the population being studied.

What is survey?

A "survey" is a systematic method for gathering information from (a sample of) entities for the purposes of constructing quantitative descriptors of the attributes of the larger population of which the entities are members.  (Groves et al. 2009)

 

What is survey?

The word "systematic" is deliberate and meaningfully distinguishes surveys from other ways of gathering information.

The phrase "(a sample of)" appears in the definition because sometimes surveys attempt to measure everyone in a population and sometimes just a sample.

 

Historical moments of survey

Harry Truman displays a copy of the Chicago Daily Tribune newspaper that erroneously reported the election of Thomas Dewey in 1948. Truman's narrow victory embarrassed pollsters, members of his own party, and the press who had predicted a Dewey landslide.

What is survey?

The purpose of survey is to produce statistics, that is, quantitative or numerical descriptions about some aspects of the population in study.

The main way of collecting information is by asking people questions; their answers constitute the data to be analyzed.


 

What is survey methodology?

Survey methodology seeks to identify principles about the design, collection, processing, and analysis of surveys that are linked to the cost and quality of survey estimates.

 

What is survey methodology?

This means that the field focuses on improving quality within cost constraints, or, alternatively, reducing costs for some fixed level of quality. “Quality” is defined within a framework labeled the total survey error paradigm. Survey methodology is both a scientific field and a profession. (Groves et al. 2009)

Goals of survey

  • Measurement of public opinion

    • Media

    • Government

  • Measurement of political perceptions and opinions market

    • Political candidates in elections

    • Political parties

  • Understand consumer preferences and interests

    • Marketing

Survey as
demand and supply of data 

  • Data-oriented information society

    • We need data on almost every aspect of decision making

  • Surveys systematically provide better data (not necessarily good).

  • One survey will drive demand of second one.

  • Survey as a career also a mission: to provide better data.

 

New Survey Methods

  • Collect data for research (vs. propaganda/promotion)

  • Gauge variations and migration of public opinion

  • Test subtle differences in concepts otherwise impossible or difficult to detect

  • Perform comparative measurement

Survey modes

  1. Face-to-face interviews

  2. Mailing survey

  3. Telephone interviews

  4. Internet survey

Survey modes

  1. Face-to-face interviews

    1. The ”Golden Rule” for decades

    2. Cost most: >$2,000 per response (e.g. ANES 2012)

    3. Operations formidable

  2. Mailing survey

    1. Cost lowest, so do response rate

Survey modes

3. Telephone interviews

  • Computer Assisted Telephone Interview (CATI)

  • Cost lower than F2F but..

  • Response rate: 5%

  • Landline --> low Representativeness

 

Survey modes

4.Internet survey

  • 50% response rate

  • Low cost

  • <$20 per response

  • Sustainable

  • Most importantly, the concept of “Opt-in” Surveys

  • Provide active data vs. passive data

  • Consistency

Survey modes

4.Internet survey

  • Critiques:

    • Computer literacy

    • Internet penetratio

    • Demographic bias

 

 

Critiques to Internet Survey

 

  • Question: Not everyone has access to computer
    • Answer: Internet penetration
  • Question: Abuse of online questionnaire
    • Answer: Use of metadata and IP address identification
  • Question: Internet survey findings questionable
    • Answer: Different modes generate very similar results (Sanders et al 2007)
  • Question: Representativeness?
    • Answer: weighting & oversampling

Process of Survey

Budget factors

  • Staff time for planning and administration
  • Sample selection costs
    • Under/over-representative segments
  • Interview administration (if F2F)
  • Cost of “cleaning” the final data
  • Analyst costs
  • Reporting

To start a new survey:

  • Is it an ongoing survey or a one-time survey?
  • What is the target population (whom is it studying)?
  • What is the sampling frame (how do they identify the people who have a chance to be included in the survey)?
  • What is the sample design (how do they select the respondents)?
  • What is the mode of data collection (how do they collect data)?

Survey Design

  1. What is the population?
  2. How to sample?
  3. Anticipated data
  4. Single mode or mixed mode
  5. Single wave or multi-wave
  6. Pilot

Population

  1. Concept of Inference

  2. Sample and Population

  3. What is a panel?

Two Types of Survey Inference

Population

It is imperative to understand the concept of inference and how much the respondents provide:

  1. Answers accurately describe characteristics of individual respondent
  2. ​Answers representative of the population

 

  1. Valid measure
  2. Reliable measure                                  

Survey Measurement Design

Survey Measurement Design

  1. Valid measure
    • Validity is measuring what is supposed to be measured.
    • Example:   
      • Wealth vs. Income
      • Happiness vs. Satisfaction             

Survey Measurement Design

2. Reliable measure

  • Reliability is measuring well what is supposed to be measured.
  • Consistency
  • Methods
    • ​Cronbach alpha

Survey Measurement Design

  1. Question wording
    1. Closed-ended questions
    2. Open-ended questions
  2. Question order
  3. Answer order

 

Questionnaire bias

  1. Avoid leading questions
    1. E.g.
      The government should force you to pay higher taxes.

Question wording

    Start designing by thinking of the answers
    1. Scale (Likert, Thermometer)
    2. Mutually exclusive choices
    3. Allow multiple selection?

Question wording

  1. Exhaust all possible answers
    • Add an open ended choice

Question wording

  1. Use even number of choices
    1. To avoid middle choice inertia
    2. Allow nonresponse

Question wording

Keep the questions short

  • Try under 25 words

  • Short and easy to understand

  • Ensure there is no errors in spelling

 

Question design

Avoid beginning questions with answers:

  • Do you very often, frequently, seldom or never.....

  • Use:

    How often do you ____?  Very often, frequently, seldom or never. 

 

Question design

Open-ended question(s)

Places open-ended question at the end of a section.

Question design

Open-ended answer(s)

When the question may not provide an exhaustive list of answers, provide open-ended choice.

Question design

Likert scale: middle choice?

Some researchers prefer even number of choices to avoid tendency to choose the middle answer. If the question begs for a more definite positioning of the respondent, use even number.  

Some questions are more difficult for choosers, allow odd number in this case.

Question design

Learn from the giants

 

Borrow from existing surveys such as GSS, ANES, TEDS since they are well tested and crafted by experienced researchers.  Alternatively, modify or adapt from the question wording styles.

Question design

Aesthetic consideration

 

Allow space for respondents to feel comfortable.  Make the page look professional with elegant design.  Try multiple color (but not too many).

Question design

"Ambiguity is the ghost most difficult to exorcise from survey questions."


–Czaja & Blair (2005)

Question design

  1. Comprehension

  2. Retrieval

  3. Judgement

  4. Response 

Survey response process

Four major components

- Tourangeau, Rips, and Rasinski (2000)
The Psychology of Survey Response

Respondents must use mental processes to read and understand a question (along with any relevant instructions), inferring the main idea of the question and identifying what the researcher is looking for regarding a response. 

Survey response process

Comprehension

Survey response process

Comprehension

Problems with comprehension arise when respondents:

  1. do not notice, do not read, or misinterpret instructions;

  2. encounter unfamiliar vocabulary in a question stem or response options;

  3. interpret words or phrases differently than the way the researcher intended; or

  4. find a question worded in an overly complex or detailed way.

Survey response process

Retrieval

Retrieval “requires recalling relevant information from long-term memory. This component encompasses such processes as adopting a retrieval strategy, generating specific retrieval cues to trigger recall, recollecting individual memories, and filling in partial memories through inference” (Tourangeau et al., 2000, p. 9). 

Survey response process

Retrieval

A retrieval strategy might be the particular way in which we attempt to remember something. For example, if you were asked to think of as many parties as you can, you might think in terms of political leaders. 

Survey response process

Judgement

Tourangeau et al. (2000) divide judgement into three distinct types:

  1. factual questions

  2. dates and durations

  3. frequencies. 

Survey response process

Response

Response requires selecting and reporting an answer to a survey question. Responding to a survey question involves groups of processes around “mapping” the answer to available response options (as with a multiple-choice type question) and “editing” the answer to meet certain criteria.

  1. Willingness

    1. Question types

    2. Question wording

    3. Social desirability (Hawthorne effect)

  2. Ability

    1. Memory

    2. Comprehensibility

    3. Culture

Respondent Willingness and Ability to Participate in a Survey

Robinson, S.B. and Leonard, K.F., 2018. Designing Quality Survey Questions. SAGE Publications.

Sampling

How well a sample represents a population depends on the sample frame, the sample size, and the specific design of selection procedures. If probability sampling procedures are used, the precision of sample estimates can be calculated.  (Fowler 2009)

Sampling and Population

Source: Fricker, R.D., 2008. Sampling methods for web and e-mail surveys. The SAGE handbook of online research methods, pp.195-216.

Sample size

One general conservative formula:

 

 

N=1/error^2

Example:

 

Use .05 as acceptable error rate (± 5 percent):

 

 

N=1/.05^2=1/.0025=400

Terminology

  1. Target Population:
    The population to be studied/ to which the investigator wants to generalize his results

  2. Sampling Unit:
     smallest unit from which sample can be selected

  3. Sampling frame
    List of all the sampling units from which sample is drawn

  4. Sampling scheme
    Method of selecting sampling units from sampling frame

     

General types of Sampling

  1. Probability samples

  2. Non-probability samples

Probability samples

A probability-based sample is one in which the respondents are selected using some sort of probabilistic mechanism, and where the probability with which every member of the frame population could have been selected into the sample is known.

 

The sampling probabilities do not necessarily have to be equal for each member of the sampling frame

Types of probability sample

  1. Simple random sampling (SRS)

  2. Stratified random sampling

  3. Cluster sampling

  4. Systematic sampling

Types of probability sample

  1. Simple random sampling (SRS) is a method in which any two groups of equal size in the population are equally likely to be selected. Mathematically, simple random sampling selects n units out of a population of size N such that every sample of size n has an equal chance of being drawn.

Types of probability sample

  1. Stratified random sampling is useful when the population is comprised of a number of homogeneous groups. In these cases, it can be either practically or statistically advantageous (or both) to first stratify the population into the homogeneous groups and then use SRS to draw samples from each group.

Types of probability sample

  1. Cluster sampling is applicable when the natural sampling unit is a group or cluster of individual units. For example, in surveys of Internet users it is sometimes useful or convenient to first sample by discussion groups or Internet domains, and then to sample individual users within the groups or domains.

Types of probability sample

  1. Systematic sampling is the selection of every kth element from a sampling frame or from a sequential stream of potential respondents. Systematic sampling has the advantage that a sampling frame does not need to be assembled beforehand. In terms of Internet surveying, for example, systematic sampling can be used to sample sequential visitors to a website. The resulting sample is considered to be a probability sample as long as the sampling interval does not coincide with a pattern in the sequence being sampled and a random starting point is chosen.

Non-Probability samples

Non-probability samples, sometimes called convenience samples, occur when either the probability that every unit or respondent included in the sample cannot be determined, or it is left up to each individual to choose to participate in the survey. 

Types of non-probability sample

  1. Quota sampling

  2. Snowball sampling

  3. Judgement sampling  

Types of non-probability sample

Quota sampling requires the survey researcher only to specify quotas for the desired number of respondents with certain characteristics. The actual selection of respondents is then left up to the survey interviewers who must match the quotas. Because the choice of respondents is left up to the survey interviewers, subtle biases may creep into the selection of the sample.

Types of non-probability sample

Snowball sampling is often used when the desired sample characteristic is so rare that it is extremely difficult or prohibitively expensive to locate a sufficiently large number of respondents by other means (such as simple random sampling). Snowball sampling relies on referrals from initial respondents to generate additional respondents. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself substantially increases the likelihood that the sample will not be representative of the population.

Types of non-probability sample

Judgement sampling is a type of convenience sam- pling in which the researcher selects the sample based on his or her judgement. For example, a researcher may decide to draw the entire random sample from one ‘representative’ Internet-user community, even though the population of interest includes all Internet users. Judgment sampling can also be applied in even less structured ways without the application of any random sampling.

Probability vs. Non-probability samples

For probability samples, the surveyor selects the sample using some probabilistic mechanism and the individuals in the population have no control over this process. In contrast, for example, a web survey may simply be posted on a website where it is left up to those browsing through the site to decide to participate in the survey (‘opt in’) or not. As the name implies, such non-probability samples are often used because it is somehow convenient to do so.

Survey errors

  • Literary Digest 1936 Poll

  • Gallup Poll 1948

Survey errors

Literary Digest 1936 Poll

 

‘Literary Digest’ mailed 10 million straw- vote ballots, of which 2.3 million were returned, an impressively large number, although it represented less than a 25 percent response rate. Based on the poll data, ‘Literary Digest’ predicted that Alfred Landon would beat Franklin Roosevelt 55 percent to 41 percent. In fact, Roosevelt beat Landon by 61 percent to 37 percent.

Survey errors

Gallup 1948 Poll

Gallup used a quota sampling method in which each pollster was given a set of quotas of types of people to interview, based on demographics. While that seemed reasonable at the time, the survey interviewers, for whatever conscious or subconscious reason, were biased towards interviewing Republicans more often than Democrats. As a result, Gallup predicted a Dewey win of 49.5 percent to 44.5 percent: but almost the opposite occurred, with Truman beating Dewey with 49.5 percent of the popular vote to Dewey’s 45.1 percent (a difference of almost 2.2 million votes).

Survey errors

Types of error

Cause

Coverage

‘...the failure to give any chance of sample selection to some persons in the population’.

Sampling

‘...the failure to give any chance of sample selection to some persons in the population’.

Nonresponse

‘...the failure to collect data on all persons in the sample’.

Measurement

‘...inaccuracies in responses recorded on the survey instruments’.


“To err is human, to forgive divine – but


to include errors in your design is statistical."


Leslie Kish, 1977

Two most common approaches to reducing coverage error

  • obtaining as complete a sampling frame as possible (or employing a frameless sampling strategy in which most or all of the target population has a positive chance of being sampled)

  • post-stratifying to weight the survey sample to match the population of inference on some observed key characteristics.

Advantages of Surveys

  • Economy: Sample to population inference

  • Comparison  between groups (e.g., SES, genders, countries)

  • Data can be aggregated over time (longitudinal/panel)

  • Multiple mode applications and comparisons

Limitations of Surveys

  • Discrepancies in prediction

  • Missing data/non-responses

  • Instrument validity over time

  • Sampling issues over time

  • Comparison across cultures/language groups

Illustration: HKES

YouGov has conducted two waves of election surveys in 2016 before and after the Legislative election.  The company provided multiple weights created using rim weighting (also called Raking) using the following data:

  1. Registered voter gender

  2. Registered voter age

  3. Registered voter district

  4. Education based on Pre-election survey result

  5. Income based on Pre-election survey result

Illustration: HKES

 

The pre and post weights have maximum values to 18.

 

The general weight value is under 5.

Illustration: HKES

Possible reasons were:
 

1.    Weights were created using different populations

2.    Panelists were more representative of the younger population

Illustration: HKES


 

For Point 1:

 

Hong Kong population has a male to female ratio of 47:53 according to the Census.  Registered voter population however has an even distribution of 49:51.  

Illustration: HKES

Illustration: HKES


 

Previous figures illustrate the big difference between the HKES sample (which has more panelists from the younger group) and registered voter population.  The latter indicates a large proportion in the elderly population.  This can be attributed to some political parties’ concerted efforts in mobilizing the elderly to register to vote. 

Illustration: HKES

Source: SCMP http://www.scmp.com/news/hong-kong/article/1855887/hong-kong-elderly-sign-droves-vote-district-council-elections

Illustration: HKES

Illustration: HKES


 

For point 2, YouGov acknowledges that the company has more access to the younger population via their recruitment channel.  It can be due to the highly savvy and active internet user population in the younger age groups.


Another reason that can be posting a problem is using two other demographic variables education and income from other population, that can be more representative of the population or the online population but not necessary the registered voter population.

 

Illustration: HKES


 

Raking is employed to generate a weight using age, gender and district only.  The range of the weight for pre wave is from .269 to 8.939.   They are slightly less varied that the original weights.  

 



 

Illustration: HKES