# Data Visualization in Practice

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

## Overview

Module 1:

2. Grammar of graphics
3. Statistical judgment

Module 2:

1. Functionalist approach:
1. Distribution
2. Composition
3. Comparison
4. Relationship

Module 3:

1. Interactive charts
1. Reactive
2. Interactive
3. Online publication

## What is data visualization?

• Visual thinking
• Educated eyes

Source: Yau 2011

• Color
• Font
• Size
• Family
• Axis
• Vertical
• Slant
• Canvas
• Size
• Theme

## Data Story:

### Data                  Messenger                           Message

>                                              >

=                                             =

## Know your data: data types

1. ### Scale

1. Nominal
2. Ordinal
3. Interval
4. Ratio
2. Categories
3. Events
4. Time series

## Tufte: Same relationship? (2001)

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

### One of the best data visualizations in history

How much information?

1. Latitude of army & features (Y-coordinate) .  2. Longitude of army & features (X-coordinate)
3. Size of army (width of line, numerals) .           4. Advance vs. Retreat color of line
5. Division of army splitting of line                       6. Temperature linked lineplot

## Functional Approach

1. ### What to plot?

1. Quantitative/Numeric data
2. Qualitative/Categorical data
2. ### One variable: Univariate

1. Distribution
2. Composition
3. ### Two or multiple: Multivariate

1. Comparison
2. Relationship

## Components of ggplot2

1. data: R data frame
2. coordinate system:  2-D space plot
3. geoms: geometric objects representing data, e.g. points, lines, polygons, etc.
4. aesthetics: visual characteristics, e.g. position, size, color, shape, transparency, fill
5. scales: governs how visual characteristic is converted to display values, e.g. log scales, color scales, size scales, shape scales.
6. stats: statistical data transformations, e.g. counts, means, medians, regression lines
7. facets: split data into subsets to display as multiple graphs

## Common geoms

Murrell, Paul. 2019. R Graphics.  CRC Press.

## Common scales

Murrell, Paul. 2019. R Graphics.  CRC Press.

## Common stats

Murrell, Paul. 2019. R Graphics.  CRC Press.

## ggplot2 coord

• ### For example, coord_trans() function transforms variables to plotted explicitly, e.g.:

​coord_trans(x="exp", y="exp"))

## Gapminder data set

### gdpPercap -  GDP per capita (US$, inflation-adjusted) ## Plot type • "p" for points • "l" for lines • "b" for both • "c" for the lines part alone of "b" • "o" for both ‘overplotted’ • "h" for ‘histogram’ like (or ‘high-density’) vertical lines • "s" for stair steps, moves first horizontal, then vertical • "S" for other steps, contrary to "s" • "n" for no plotting. ## Plot symbols (plot characters) pch = 0,square pch = 1,circle pch = 2,triangle point up pch = 3,plus pch = 4,cross pch = 5,diamond pch = 6,triangle point down pch = 7,square cross pch = 8,star pch = 9,diamond plus pch = 10,circle plus pch = 11,triangles up and down pch = 12,square plus pch = 13,circle cross pch = 14,square and triangle down pch = 15, filled square pch = 16, filled circle pch = 17, filled triangle point-up pch = 18, filled diamond pch = 19, solid circle pch = 20,bullet (smaller circle) pch = 21, filled circle blue pch = 22, filled square blue pch = 23, filled diamond blue pch = 24, filled triangle point-up blue pch = 25, filled triangle point down blue ## Plot symbols: PCH Additional: * . o O note: takes longer to plot ## More on symbols (programming) & - ampersand ‘ - apostrophe or single quote * - asterisk @ - at {} - braces or curly brackets [] - brackets ^ - carat <> - angle brackets or chevron ~ - tilde | - pipe # - pound - - hyphen ## Line type Line types can be specified with: • An integer or name: 0 = blank, 1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash ## Line type • The lengths of on/off stretches of line can be determined with a string containing 2, 4, 6, or 8 hexadecimal digits (1 - f) which give the lengths of consecutive lengths. • For example, the string "33" specifies three units on followed by three off and "3313" specifies three units on followed by three off followed by one on and finally three off. ## Line type • 44 • 13 • 1343 • 73 • 2262 ## ggplot() ### creates a plot object, layer by layer plot object p cannot be displayed without adding at least one layer at this point, there is nothing to see! install.packages("ggplot2") library(ggplot2) p <- ggplot(data = gm) p <- ggplot(data = gm, mapping = aes(x = gdpPercap, y = lifeExp)) p + geom_point(size=2) ## Financial Times: Visual vocabulary 1. ### Deviation 2. ### Correlation 3. ### Ranking 4. ### Distribution 5. ### Change over time 6. ### Magnitude 7. ### Part-to-Whole 8. ### Spatial 9. ### Flow ## Financial Times: Visual vocabulary 1. ### Deviation 2. ### Correlation 3. ### Ranking 4. ### Distribution 5. ### Change over time 6. ### Magnitude 7. ### Part-to-Whole 8. ### Spatial 9. ### Flow ## Data import: vroom • ### col_logical() ‘l’, containing only T, F, TRUE, FALSE, 1 or 0. • ### col_integer() ‘i’, integer values. • ### col_double() ‘d’, floating point values. • ### col_number() [n], numbers containing the grouping_mark • ### col_date(format = "") [D]: with the locale’s date_format. • ### col_time(format = "") [t]: with the locale’s time_format. • ### col_datetime(format = "") [T]: ISO8601 date times. • ### col_factor(levels, ordered) ‘f’, a fixed set of values. • ### col_character() ‘c’, everything else. • ### col_skip() ‘_, -', don’t import this column. • ### col_guess() ‘?', parse using the “best” type based on the input. ## Module 3 • ### Creating a Dashboard using Shiny • ### Reactivity • ### Interactive charts using Plotly ## Overview • ### What is Shiny? • ### Components of Shiny • ### Structure of Shiny app • ### Publicizing Shiny app on GitHub ## What is Shiny? ### A Shiny app is a web page (UI) with user interface connected to a computer running a live R session (Server). ### Users can design the UI, which provide interactive interface to visualize data (by running R code). ### Shiny app = R + Interactivity + Web hosting ### Presenting interactive data and charts ## Shiny deployment • ### RStudio Shiny server (http://www.shinyapps.io ) • ### Needs Shiny account (connect via GitHub/Google) • ### Pro Shiny account (commercial) • ### Install own Shiny server (https://github.com/rstudio/shiny-server)​ • ​Linux server • Free and open source ## Shiny reference and resources • ### Shiny user guide online: (https://docs.rstudio.com/shinyapps.io/index.html) • Shiny Cheatsheet (RStudio) ## Components of Shiny 1. ### User Interface (ui.R) — The UI is the frontend that accepts user input values. 2. ### Server function (server.R) — The Server is the backend that process these input values to finally produce the output results that are finally displayed on the website. 3. ### shinyApp function — The app itself that combines the UI and server components together. ## Layout and interface • Design & explore UI framework • Inputs within the UI framework • Outputs within the UI framework • Assemble UI with HTML/CSS/... widgets • Adjustment of the layout scheme ## Structure of Shiny program # install.packages("shiny") # install.packages("shinythemes") library(shiny) library(shinythemes) # Create User Interface ui <− fluidPage () # Build R objects displayed in UI server <− function(input , output){} # Create Shiny app shinyApp(ui = ui, server = server)  • ui: Nested R functions that assemble an HTML user interface for the app (some HTML knowledge needed) • server: A function with instructions on how to build and rebuild the R objects displayed in the UI • shinyApp: Combines ui and server into a functioning app ## ui.R • ### Nested R functions that assemble an HTML user interface for the app • Example: • ui = creates the user interface object • fluidPage() function create the layout page that includes: • input • output library(shiny) ui = fluidPage( numericInput(inputId = "n", "Sample size", value = 50), plotOutput(outputId = "hist"))  ## server.R • ### Composed of R codes to process input and generate output: • Example: • Read in the data from input (from ui.R) • Create the chart (i.e. histogram) server = function(input , output){ output$hist = renderPlot ({
hist(rnorm(input\$n)) })}

## shinyApp()

• ### Combine ur.R and server.R and execute

• Example:

• Output in plot window ready for publishing
shinyApp(ui = ui , server = server)

## Reactivity

• Reactive values work together with reactive functions. Call a reactive value from within the arguments of one of these functions to avoid the error

• Operation not allowed without an active reactive context.

fileInput(inputId, label, multiple, accept)
numericInput(inputId, label, value, min, max, step)
passwordInput(inputId, label, value)
radioButtons(inputId, label, choices, selected, inline)
selectInput(inputId, label, choices,
selected, multiple, selectize, width, size) (also selectizeInput())
sliderInput(inputId, label, min, max, value, step, round, format, locale, ticks, animate, width, sep, pre, post)

By Karl Ho

• 111