Data Visualization in Practice

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Workshop prepared for National Chung Hsing University
December 19, 2020

Overview

Module 1:

  1. Know your data
  2. Grammar of graphics
  3. Science of Data Visualization

Module 2:

  1. Functionalist approach:            
    1. Distribution       
    2. Composition
    3. Comparison
    4. Relationship

Module 3:

  1. Dashboard                                   
    1. Reactive
    2. Interactive

What is data visualization?

  • Learn to read your data

    • Visual thinking
    • Educated eyes

How much information does this picture present?

Multibeam sonar backscatter data draped on bathymetry off Santa Monica Calif. Yellow is high backscatter. Santa Monica sewer pipe and diffuser is visible in upper part of image (y-shaped feature). Red-brown dots represent color-coded fish abundance as determined from trawl data.

Source: https://tinyurl.com/ydhqtr8f

What do we learn from the image?

• Visualization provides an ability to comprehend huge amounts of data. The important information from more than a million measurements is immediately available.

 

What do we learn from the image?

  • Visualization often enables problems with the data to become immediately apparent. A visualization commonly reveals things not only about the data itself but also about the way it is collected. With an appropriate visualization, errors and artifacts in the data often jump out at you. For this reason, visualizations can be invaluable in quality control.

What do we learn from the image?

  • Visualization facilitates understanding of both large-scale and small-scale features of the data. It can be especially valuable in allowing the perception of patterns linking local features.

     

What do we learn from the image?

Visualization facilitates hypothesis formation, inviting further inquiries into building a theory.

(Colin Ware 2012, Ch. 1) 

Data Story:

Source: Yau 2011

  • Color
  • Font 
    • Size
    • Family
  • Axis
    • Vertical
    • Slant
  • Canvas
    • Size
    • Theme

Data Story:

Data

Message

Mechanical process

 Data                  Messenger                           Message     

>                                              >

=                                             =

Know your data: data types

  1. Numeric data

    1. Scale

      1. Nominal 
      2. Ordinal
      3. Interval
      4. Ratio
  2. Categories
  3. Events
  4. Time series

Quantitative vs. Qualitative Data

  1. Numbers vs. Labels

  2. Quantity vs. Quality

  3. Ordinal, Interval, Ratio vs. Nominal

  4. e.g. Yes/No--> Qualitative

  5. e.g. How much--> Quantitative

Quantitative vs. Qualitative Data

  1. Higher quantity means higher quality?

  2. Higher quality leads to higher quantity?

What to visualize in data?

  1. Data Generating Process

  2. Property

  3. Distribution

  4. Pattern

  5. Differences

  6. Relationship

  7. Dimensionality

Elements of a Chart

  1. Dimensionality

    1. How many dimensions are there?

  2. Relationships

    1. ​Strength

    2. Fit

    3. Error bands

    4. Panels

Time series data

  1. Nature

    1. Temporal dependency: non-stationarity autocorrelation

    2. Periodicity: seasonality, cycle

  2. Zeros -> events?

  3. Scale linearity

Time series data

  1. Nature

    1. Temporal dependency: non-stationarity autocorrelation

    2. Periodicity: seasonality, cycle

  2. Zeros -> events?

  3. Scale linearity

Event count data

  1. Nature

    1. Distribution

    2. Bounds

      1. No upper bounds

      2. One lower bound: zero

    3. Zeros

  2. Continuous vs. discrete

  3. Intervals vs. duration

Anscombe example (1973)

Anscombe example (1973)

Anscombe example (1973)

Anscombe example (1973)

Anscombe example (1973)

Tufte: Same relationship? (2001)

Tufte: Same relationship? (2001)

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

One of the best data visualizations in history

How much information?

1. Latitude of army & features (Y-coordinate) .  2. Longitude of army & features (X-coordinate)
3. Size of army (width of line, numerals) .           4. Advance vs. Retreat color of line
5. Division of army splitting of line                       6. Temperature linked lineplot
7. Time linked lineplot

Adelson's Checker-Shadow

Colors of A and B boxes different?

Adelson's Checker-Shadow

Colors of A and B boxes different?

Coffer Illusion by Anthony Norcia

See any circles?  How many?

Know your data: Data Literacy

  1. Data generating process

  2. Graphic grammar

  3. Statistical judgement

 

  1. Data generating process

    1. ​How data are generated

    2. Distribution

    3. Missing values

    4. Wrong data

Know your data: Data Literacy

  1. Graphic grammar

    1. Bad charts deliver incorrect message

    2. Poor design

    3. Color

    4. Label

    5. Scale

    6. Dimensionality

Know your data: Data Literacy

  1. Statistical understanding

    1. Size does (not) matter

    2. Representativeness does

    3. Forecast/prediction minded

    4. Explanation

Know your data: Data Literacy

Functional Approach

  1. What to plot?

    1. Quantitative/Numeric data
    2. Qualitative/Categorical data
  2. One variable: Univariate

    1. Distribution
    2. Composition
  3. Two or multiple: Multivariate

    1. Comparison
    2. Relationship

Thought starter

Thought starter

Financial Times: Visual vocabulary

  1. Deviation

  2. Correlation

  3. Ranking

  4. Distribution

  5. Change over time

  6. Magnitude

  7. Part-to-Whole

  8. Spatial

  9. Flow

Financial Times: Visual vocabulary

  1. Deviation

  2. Correlation

  3. Ranking

  4. Distribution

  5. Change over time

  6. Magnitude

  7. Part-to-Whole

  8. Spatial

  9. Flow

Module 3

  • Creating a Dashboard using Shiny

  • Reactivity

  • Interactive charts using Plotly

Overview

  • What is Shiny?

  • Components of Shiny

  • Structure of Shiny app

  • Publicizing Shiny app on GitHub

What is Shiny?

A Shiny app is a web page (UI) with user interface connected to a computer running a live R session (Server).

Users can design the UI, which provide interactive interface to visualize data (by running R code).

Shiny app = R + Interactivity + Web hosting

Presenting interactive data and charts

Shiny deployment

  • RStudio Shiny server (http://www.shinyapps.io )

    • Needs Shiny account (connect via GitHub/Google)

  • Pro Shiny account (commercial)

  • Install own Shiny server (https://github.com/rstudio/shiny-server)

    • ​Linux server
    • Free and open source

Shiny reference and resources

Components of Shiny

  1. User Interface (ui.R) — The UI is the frontend that accepts user input values.

  2. Server function (server.R) — The Server is the backend that process these input values to finally produce the output results that are finally displayed on the website.

  3. shinyApp function — The app itself that combines the UI and server components together.

Layout and interface

  • Design & explore UI framework
    • Inputs within the UI framework
    • Outputs within the UI framework
  • Assemble UI with HTML/CSS/... widgets
  • Adjustment of the layout scheme

Structure of Shiny program

# install.packages("shiny")
# install.packages("shinythemes")

library(shiny)
library(shinythemes)
# Create User Interface
ui <− fluidPage ()

# Build R objects displayed in UI
server <− function(input , output){}  

# Create Shiny app
shinyApp(ui = ui, server = server)
  • ui: Nested R functions that assemble an HTML user interface for the app (some HTML knowledge needed)

  • server: A function with instructions on how to build and rebuild the R objects displayed in the UI

  • shinyApp: Combines ui and server into a functioning app

ui.R

  • Nested R functions that assemble an HTML user interface for the app

  • Example:


     
    • ui = creates the user interface object
      • fluidPage() function create the layout page that includes:
        • input 
        • output
library(shiny)
ui = fluidPage(
  numericInput(inputId = "n", "Sample size", value = 50),
  plotOutput(outputId = "hist")) 

server.R

  • Composed of R codes to process input and generate output:

    • Example:


      • Read in the data from input (from ui.R)
      • Create the chart (i.e. histogram)
server = function(input , output){ output$hist = renderPlot ({
    hist(rnorm(input$n)) })}

shinyApp()

  • Combine ur.R and server.R and execute

    • Example:

       
      • Output in plot window ready for publishing
shinyApp(ui = ui , server = server)

Reactivity

  • Reactive values work together with reactive functions. Call a reactive value from within the arguments of one of these functions to avoid the error

  • Operation not allowed without an active reactive context.

fileInput(inputId, label, multiple, accept)
numericInput(inputId, label, value, min, max, step)
passwordInput(inputId, label, value)
radioButtons(inputId, label, choices, selected, inline)
selectInput(inputId, label, choices,
selected, multiple, selectize, width, size) (also selectizeInput())
sliderInput(inputId, label, min, max, value, step, round, format, locale, ticks, animate, width, sep, pre, post)

Inputs