Social and Political Data Science: Introduction

Computational Social Science

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Text Analytics

  • What is text data?  Why it matters?

  • Text Analytics Workflow

  • Unsupervised learning methods using text data including pattern detection and visualization

  • Supervised learning models using text data

  • Latest developments and applications using text data

  • Illustrations

Overview

What is Text Data?

Text data refer to any documents or corpus in text forms.

  • Structured data refers to text with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable.

  • Unstructured data do not have a pre-defined data model or is not organized in a pre-defined manner.

The TEXT data type stores any kind of text data. It can contain both single-byte and multibyte characters that the locale supports.

- IBM

What is Text Data?

Why Text Data Matters?

Text data can provide information on:

  • Individuals
    • Grouping (Topic modeling)
    • Preferences (Recommendation system)
    • Positions (Social network analysis, Ideological scaling)
    • Sentiments
  • Organizations
    • Organizational behaviors (e.g. party manifestos, company policies)
  • Governments
    • State strategy
    • Policy shifts
    • Administrations

Why Text Data Matters?

Data Generation: Text data are human records

  • Explosive growth of text data in size

Source: BusinessWire

Why Text Data Matters?

Data Generation: Text data are human records

  • Big data are more and more generated by machines but driven by human activities.
  • Text data are multiplied by human languages via IoT.
  • Human languages:
    • Dialogues in:
      • Text
      • Speeches
      • Gestures (body language)
      • Movements

Why Text Data Matters?

Data Generation: Text data are human records

  • Because of size and growth, it is necessary to have computer aided reading, comprehension, organization and modeling.
  • Imagine:
    • how much time it takes to read decades of government report?
    • how much it costs to misunderstand an important message by terrorist?
    • how many lives can be risked by misreading a government's military action?

CSS: Text Analytics

By Karl Ho

CSS: Text Analytics

  • 113