Data Methods: Focus Group Research

Data Methods:
Web Data

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Web data

Accumulation of data begins with transaction and interaction between humans.  The advent of new internet technology transcends the data accumulation using high speed computation, large storage and caching systems.


Web data

How do we take advantage of the web data?

  1. Purpose of web data

  2. Generation process of web data

  3. What is data of data?

  4. Why social scientists need to collect web data?

Web data: Technical side

Web scraping

- obtaining information directly from web pages

APIs (Application program interface)
- web services that allow an interaction with, and retrieval of, structured data.

Web data: Tools

  1. SAS

  2. R

  3. Python

  4. Tableau

  5. Data Mining Packages

Web data technologies

Source: Munzert, Simon, Christian Rubba, Peter Meißner and Dominic Nyhuis, Simon. Automated Data Collection with R  : a Practical Guide to Web Scraping and Text Mining . Chichester, England: Wiley, 2015. Print.

Web data: API's (data source)

  1. Social Media                              

    1. Facebook

    2. Twitter

    3. Instagram

    4. Youtube

  2. News websites

  3. Government websites

  4. NGOs

Sentiment Analysis

When human readers approach a text, we use our understanding of the emotional intent of words to infer whether a section of text is positive or negative, or perhaps characterized by some other more nuanced emotion like surprise or disgust.


                          - Silge and Robinson 2017

Sentiment analysis: Volume

Sentiment analysis:
Word counts

Sentiment analysis:
Word sentiments

Social Media Impression of "Taiwan"

Social Media Impression of "Taiwan"

Sentiment Analysis

We can use the tools of text mining to approach the emotional content of text programmatically.

Sentiment Analysis

One way to analyze the sentiment of a text is to consider the text as a combination of its individual words and the sentiment content of the whole text as the sum of the sentiment content of the individual words.


The three general-purpose lexicons are

  • AFINN from Finn Årup Nielsen,

  • bing from Bing Liu and collaborators, and

  • nrc from Saif Mohammad and Peter Turney.

Word list - Lexicons

 AFINN wordlist , which has 2477 words and phrases rated from -5 [very negative] to +5 [very positive]. AFINN words is divided into four categories :

  • Very Negative (rating -5 or -4)

  • Negative (rating -3, -2, or -1)

  • Positive (rating 1, 2, or 3)

  • Very Positive (rating 4 or 5 or 6)

Word list - Lexicons

Bing wordlist , named after one of the most cited article on opinion mining.  The author is Bing Liu.


Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), pp.1-167.

Word list - Lexicons

nrc - The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing. 

Illustration: Public Sentiments on Trade War

Research question:


How does the general public in the United States feel about President Trump's trade war with China?


Illustration: Public Sentiments on Trade War



Sentiment analysis using Twitter data


Keyword: Trade war, Trump, China

Illustration: Public Sentiments on Trade War

Illustration: Public Sentiments on Trade War

Illustration: Public Sentiments on Trade War