Introduction

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Data Visualization

SDAR Orientation

Date: 8/25/2023

Time: 10:30 - 12 noon

Location: San Antonio Room GR2.326

RSVP needed for lunch

Preamble

What you get from this class:

  • Data literacy

  • Visual thinking

  • Mastering data visualization

What is required:

  • Full attendance
  • Full participation
  • Original works                       

Budgeting considerations

  • Team work: like Hackathon weekly

  • Reading load: ~ one book weekly

  • Coding load: varies but data product weekly

  • No exams or quiz

  • Assignments (post on GitHub website)
  • Projects (individual and team)                   

 

  • If you cannot make these commitments, consider dropping it now and spare the seat for those on waiting list

Data visualization - a different Data Science training

This course is not:

  • application-based course
    • ​No Excel, Tableau, Power BI
  • ​short course for creating charts only

This course is:

  • theory-based reading seminar
  • programming-based coding camp

Why not teaching application?

  • This graduate course is not a two-day workshop teaching how to use an application to create charts.

  • It builds strong foundation on data, design and graphic theory.

  • It trains "educated" perception and "thinking eyes".

Approaches in data visualization

  • Data journalism

  • Business Intelligence/Analytics

  • Computer science

  • Data Science

What is data visualization?

This is a course in the Social Data Analytics and Research (SDAR) series training students in:

  • Data Literacy

    • Understand data theory

    • Manage data

    • Analyze data

  • Data Skills

    • Programming

    • Tools

Data Story:

Source: Yau 2011

  • Color
  • Font 
    • Size
    • Family
  • Axis
    • Vertical
    • Slant
  • Canvas
    • Size
    • Theme

Analytics using Twitter data

Automated Machine Learning 

Automated Machine Learning 

Replicating Rosling

Replicating Rosling

What is data visualization?

Objectives:

  1. Understand data theory

  2. Be familiar with principles behind effective data visualization

  3. Read complex data through educated reviews

  4. Communicate message in data effectively using advanced visualization techniques

What is data visualization?

  • Data visualization is to deliver a message from your data.

  • It is like telling a story using the chart or data applications.

  • Sometimes the data is huge or the story to too long to tell.

  • Visualization provides an ability to comprehend huge amounts of data. The important information from more than a million measurements is immediately available.

What to visualize in data?

  1. Data Generating Process

  2. Property

  3. Distribution

  4. Pattern

  5. Differences

  6. Relationship

  7. Dimensionality

Elements of a Chart

  1. Dimensionality

    1. How many dimensions are there?

  2. Relationships

    1. ​Strength

    2. Fit

    3. Error bands

    4. Panels

What is data visualization?

  • Learn to read your data

    • Visual thinking
    • Educated eyes

What is data visualization?

Data visualization is to communicate data patterns, findings and insights via visual representation of data.  It is well beyond just creating a chart but to train "Thinking Eyes" and build data literacy.

How much information does this picture present?

Multibeam sonar backscatter data draped on bathymetry off Santa Monica Calif. Yellow is high backscatter. Santa Monica sewer pipe and diffuser is visible in upper part of image (y-shaped feature). Red-brown dots represent color-coded fish abundance as determined from trawl data.

Source: https://tinyurl.com/ydhqtr8f

What do we learn from the image?

• Visualization provides an ability to comprehend huge amounts of data. The important information from more than a million measurements is immediately available.

 

What do we learn from the image?

  • Visualization often enables problems with the data to become immediately apparent.

  • Visualization commonly reveals things not only about the data itself but also about the way it is collected.

  • With an appropriate visualization, errors and artifacts in the data often jump out at you.

  • For this reason, visualizations can be invaluable in quality control.

What do we learn from the image?

  • Visualization facilitates understanding of both large-scale and small-scale features of the data.

  • It can be especially valuable in allowing the perception of patterns linking local features.

     

What do we learn from the image?

  • Visualization facilitates hypothesis formation, inviting further inquiries into building a theory

  • (Colin Ware 2012, Ch. 1) 

Messages:

  • Gradual rise?   

 

 

 

  • Abrupt Drop of Ranking?

Source: Chris Adolph, also Johnson, R.R. and Kuby, P.J., 2011. Elementary statistics. Cengage Learning.

Look closer

Look closer

Message:

  • Age structure of college enrollment

  • How much data are presented in multiple colors?

Source: Edward R. Tufte. 2001. The Visual Display of Quantitative Information. Graphics Press. 2nd ed.

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

One of the best data visualizations in history

Charles Joseph Minard, in mapping Napoleon's march on Moscow

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

One of the best data visualizations in history

How much information?

1. Latitude of army & features (Y-coordinate) .  2. Longitude of army & features (X-coordinate)
3. Size of army (width of line, numerals) .           4. Advance vs. Retreat color of line
5. Division of army splitting of line                       6. Temperature linked lineplot
7. Time linked lineplot

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

One of the best data visualizations in history

Combines narrative & analysis:
a technique mostly lost until this century

  • May be a spurious relationship here: time and temperature
  • Note the deaths at river crossings – usually, these rivers would be frozen Did Napoleon choose too warm a winter to invade Russia?

- Chris Adoph

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard
Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

Know your data: data types

  1. Numeric data

    1. Scale

      1. Nominal 
      2. Ordinal
      3. Interval
      4. Ratio
  2. Categories
  3. Events
  4. Time series

Quantitative vs. Qualitative Data

  1. Numbers vs. Labels

  2. Quantity vs. Quality

  3. Ordinal, Interval, Ratio vs. Nominal

  4. e.g. Yes/No--> Qualitative

  5. e.g. How much--> Quantitative

Quantitative vs. Qualitative Data

  1. Higher quantity means higher quality?

  2. Higher quality leads to higher quantity?

Time series data

  1. Nature

    1. Temporal dependency: non-stationarity autocorrelation

    2. Periodicity: seasonality, cycle

  2. Zeros -> events?

  3. Scale linearity

Time series data

  1. Nature

    1. Temporal dependency: non-stationarity autocorrelation

    2. Periodicity: seasonality, cycle

  2. Zeros -> events?

  3. Scale linearity

Event count data

  1. Nature

    1. Distribution

    2. Bounds

      1. No upper bounds

      2. One lower bound: zero

    3. Zeros

  2. Continuous vs. discrete

  3. Intervals vs. duration

Data: Daily COVID deaths

Adelson's Checker-Shadow

Colors of A and B boxes different?

Adelson's Checker-Shadow

Colors of A and B boxes different?

Coffer Illusion by Anthony Norcia

See any circles?  How many?

Coffer Illusion by Anthony Norcia

See any circles?  How many?

  • Swedish physician and statistician

  • Founded Gapminder Foundation

  • Visualize historical data on public health and poverty

 

  • Visual Display 1982

Edward Tufte

William Cleveland

  • 12 rules of data visualization

Data Visualization: Introduction 2023

By Karl Ho

Data Visualization: Introduction 2023

  • 125