The changing landscape of Data visualization tools in Large Organizations

http://slides.com/justingosses/DataVizToolsHistoryLargeOrg

Context

MY PERSPECTIVE

Relatively new to having a job title that sounds like I write code for a living

 

self-taught   taught by internet

 

My large organization experience

REASONS FOR TALK

Wanted an excuse to learn more about the topic

 

Curious to hear others' experience  & Opinions

Oil & Gas

NASA

Justin Gosses

November 2016

https://github.com/JustinGOSSES/talk_HistDVTools

http://slides.com/justingosses/DataVizToolsHistoryLargeOrg

What is this talk about?

Changing landscape of data visualization and analytical tools:

Quick Overview of 1976-2016

Data visualization in large organizations:

  • value to org
  • deciding what tools to use?
  • reasons new tools might be adopted

 

Raise your hand if

 

you've spent time searching for the right visualization tool for yourself

 

 

 

Audience Question:

Raise your hand if

 

you've partcipated in selecting a data visualization tool for your organization

 

 

 

Audience Question:

Raise your hand if

 

your organization told you what tool to use for a data visualization task

 

 

 

Audience Question:

Audience Question:

Raise your hand if

 

you work in an organization of more than 500 people

Data Transfer

Data Cleaning 

Analytics

Analytics and other tasks sometimes get done by same software as data visualization

Everything done via code

Data Visualization

Analytics

Data Visualization

Data Transfer

Separate GUIs

code library

Tasks done by separate software & code

Hard to talk about just data visualization tools

Data Transer

Data Cleaning 

Analytics

Data Visualization

R inside GUI for these

GUI used for these

Single Software

GUI used for these

Data Cleaning 

Changing Tool landscape

What drove changes?

1976-2016

Past Landscape

Small Data Analytics in Large Organizations Dominated By 3 types of tools

"th

Originally, Not Much Between The Islands

Excel

Code

Industry

Specific

Desktop

GUIs

WHERE & HOW DATA VISUALIZATION HAPPENS

mainframe

pre-installed application on personal computer

Business Intelligence, expensive applications, if not excel

some data-viz specific code libraries but not pure JS, start of web-based 

pure JavaScript libraries, web default

Spreadsheets & Charts

1976-1993  Timeline

 

  • Spreadsheet analytics as way to sell computers
    • Pattern of tool development in small companies, then bought by larger companies to be provided to user for free with OS
  • Charting secondary to calculation

Early internet doldrums

 1994-2006   Timeline

  • Data Limit Implications
    • Data often stored on local machines or on-site servers
    • Data visualization too big for puny internet
    • Software not pre-installed must be physically delivered
      • require distribution, marketing, big scale or $ mark-up
  • Shared via print-outs and presentations
  • Early visualization libraries but not pure-JavaScript

download speeds were an early constraint

Many of the data visualizations that load today in <1 second today would take 10s of seconds to download (and then additional time for your browser to display) years ago

(you can use this calculator to figure out how long your data visualizations would take to download in the past) 

Nielsen's "Law" of Bandwidth

Edholm's "Law" of Bandwidth

both of these use top-of-the-line speeds at the time and show more or less the same thing

The last D3.js visualiation I made :  2 minutes in 1998, 5 seconds in 2003, <1 seconds in 2005

Evolution of the web

a google chrome experiment from 2012

Browser ability is an influence on web-based data visualization tools

Data on WEb is Easier

 2006-2016   Timeline

  • Browsers get more powerful
  • Important web standards
  • Internet speeds increase
  • Many JavaScript libraries for data visualization appear
  • Open-source increases rate of development
    • Pattern of finding ways to maximize use of web standards after multile iterations
    • More things with more perspectives
  • Universities continue to house pro-longed early development
  • More data = more need for data visualization tools
2000s
2000s

Raise your hand if

 

You have a university degree in computer science or similar field

Audience Question:

1996

2016

1. Everyone from 1996

 

2. A lot of people know a bit for work, often within another piece of software.

 

3. A bunch of students who were taught in a college class but aren't C.S. students.

 

4. code bootcamp students

 

5. Internet taught

Who writes code?

These groups are growing fast!

Stack Overflow 2016 Developer Survey found =

1. People with C.S. degrees

2. Hackers

3. People making things in their garages

More people are doing more advanced data visualization, because more people know how to code

majority don't have a traditional C.S. degree

What are recent trends

that are changing data visualization tools we use?

2016

 

 

~Arm waving~

 

Present Landscape

More Options

Excel

Code

Industry Specific Desktop GUIs

Salesforce

Tableau

D3.js

chart.js

Venga

QlikView

Domo

cloud-based

platforms as a service 

oil and gas data & analysis as online service

Spotfire

hundreds of BI options

Cost pressure?

More libraries

More GUIs have code as option

More Industry specific GUIs are being built with plug-in capability

bokeh

templates & add-ons purchased piecmeal

analytics software have r and python as default instead of vba

Altair

Microsoft BI

A MORE CROWDED LANDSCAPE WITH MORE HYBRIDS

 FASTER TO BUILD THINGS & MORE WAYS TO VISUALIZE DATA

open-source making its way into industry-specific software more

WEbGL > SVG

More Hybrids

assumptions of better data flow  across systems

 Writing code is faster

Galleries Save Time

User or 3rd party generated examples extensions, templates, and plug-ins

Instead of standing on the shoulders of long ago giants, you stand on the shoulders of anyone doing similar work, somewhere, right now.

 Speed of new things and diversity of things goes way up. Both open-source and $ license-based

Tableau

Petrel

D3.js

Spotfire

Tableau

Spotfire

D3.js

Ruths.ai template

Why does data visualization matter in a large organizaton?

What is the value to organizations of better & more data visualization?

faster understanding

more people impacted

better understanding

data visualization

better decisions

What is the value to organizations of better & more data visualization?

faster understanding

more people impacted

better understanding

data visualization

better decisions

a data driven organization

staff more easily spots trends and evaluates data

Minimize scrolling, need for memorization, or manual gathering of right data for comparison

data visualization easily reached on web, out of silos, auto-updated, and fast intuitive navigation

What Data ?

in large organizations?

Financial

Marketing

Project Management

Science / Engineering

Logistics / Facilities

HR

For What Purpose ?

in large organizations?

Quick Glance Dashboards

story telling

means to shorten path to right data for that user

Enable understanding of complex data

Keep users' attention, so they use the data to make better decisions

Data Exploration

Different Tools for different situations

....a tool that includes certain data visualization types

...an interactive web-based app

...simple and fast GUI

...analytics and visualization combined

...something that costs nothing

....something with lots of other users and support for credibility to certain managers

...full control over design and function

....a tool that takes data from an API or a specific data format

your situation

another

situation

better tools when appropriate

&


changing to a more modern Data visualization method

A theoretical example

Excel

d3.js

scenario:

standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization can see wider context

In additional to multiplie tools, sometimes there is cause to change tools

Ideally

  • Easily found by anyone on intranet
  • Intuitive and easy to navigate to the small portion of the whole that concerns a user
  • accurate
  • updated frequently if not real-time
  • easy to compare parts of the data
  • Sustainable to maintain long-term
  • low cost

Want to avoid

scenario:

standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization and can see wider context

  • Visualization is hard to reach or find.
  • Large amounts of scrolling or clicking is needed to find subset of data.
  • Data is inaccurate
  • Data is out of date or of different vintages
  • Printing or memorization is required to compare data subsets.
  • dependent on skill choke-points

Excel

D3.js 

  • build once
  • data input updates without rebuild
  • control every detail & looks sharp
  • no license cost, Open Source
  • can easily to be tied into data API
  • Easy for 10,000 or only 1 person to access

Excel

D3.js

  • takes longer to build 1st time
  • requires someone who can write JavaScript and has permissions
  • requires developer (&subject expert?)
  • Ideally, not changed frequently
  • everyone  can build easily
  • familiar
  • support of large company that won't go away
  • works with standard data types for tabular data
  • need to re-make as data changes
  • large amount of manual labor
  • not built for data API ingest
  • not real-time
  • hard to distribute widely or online

PRO

Con

key observations

Diversity of Data Visualization Tools Tool characteristics that affect adoption

costs

user skills

features

IT requirements to run

Does it play well with other tools

These can be thought of as both tool characteristics and possible organizational constraints

costs

user skills

features

IT requirements to run

Does it play well with other tools

Tool

characteristics

Org constraints

Who needs to sign off?  Is internal process slow? What about long-term vendor buy-in

Will the people in charge of those other system let them play?

How does it compare to other options your organization already has?

Is your organization set up to teach those new skills or is that too much to ask?

Does getting these require funding? Does group doing data viz have access to getting them?

now imagine this as a very long form & multi-step process

runs on web with server code

runs on web front-end only

runs on web with $ server license

only code

only GUI

GUI and code

GUI and only API code

Only on web as static image

Integrates easily with API data sources

Data munging included or not?

Integrates multiple languages

GUI and code for analytics part only

JS

R

Python

How long does it take to learn?

 Ease of UX

 # of clicks

What data formats can it ingest

Can it export visualization that is just html/CSS/JS?

vendor buy-in: If company goes away or license ends, does all previous work go up in smoke?

Need support

Does it integrate nicely with other software?

Can it create or update automatically?

# of data visualizations types built in

License costs

enterp. vs. indiv.

Does it play well with other tools

IT requirements to run

features

costs

users

Skill distribution within org

Where is data?

Is data clean?

How is data accessed?

How well does your data flow?

Changing these in large, old organizations is hard

but improves productivity as..

local data

Standardized Data Analysis 

Viewer is creator?

Viewer is <10 people

Viewer is large groups

non-standardized Data analysis

Data pre-cleaned

Data require cleaning

Data searching required many times

Data location standardized

data is from central location

API or plug-in data exchange

human data input

one off visualization

new data cycled into standard visualization often

result is fine local only

result on web

> man hours / viz

> eyeballs/viz

< eyeballs/viz

< man hours / viz

more people impacted for less work

Ways organizations affect adoption of data viz tools

Purchase Scale

Communication

Training

Support

Policy

Vagueness

Culture

Security

Centralized vs. not organizational model

Considerations,

when trying to push New tool adoption in a large organization

People already do it a certain way

It is likely the person you have to ask for approval was the person who approved the system you think needs replaced

If you're suggested automating something, that excess human labor is attached to a human

Pitching new data visualization as augmentation rather than replacement may have better success.

The person with approval rights might not be operating on same time span or priorities as you

Systems for data storage and transfer may not be built with your new tool in mind or even built with any tool <7 years old in mind

Data silos can sometimes have data trolls

Sometimes fake data can be used initially to show proof of concept and provide momentum with management

People are impacted

Data is power

Process may not be built with you in mind

Getting systems to play nice

Questions?

https://github.com/JustinGOSSES/talk_HistDVTools

http://slides.com/justingosses/DataVizToolsHistoryLargeOrg

backup slides:

 

dragons be to the right (slides not used)

near-future (0-5yr) trends

that are changing data visualization tools we use?

~Even more Arm waving~

 

Future trends ?

  • VR
  • More 3D
  • more latency, due to two things above....
  • AI in data prep & chart style selection
    • the return of clippy? but less annoying?
  • Continued focus on minimizing data prep through data architecture
  • Even more blending of BI & Data Science & IT?
    • through better BI (Tableau that does everything)
    • Or easier flow between different components?
  • More people do data visualization as a "part" of their job
  • More definition for what a 100% data visualization person does?
    • more "data visualization" jobs on linkedIN right now than a year ago
  • Less grunt work (due to better data engineering, better data prep tools)
  • APIs that talk to APIs that talk to APIs (tools, IoT, storage, code, GUIs, etc.)

Recent & near Future Trends

  • Component over monolithic architecture
  • Platform as a service, cloud
  • Data, analysis, visualization, etc. as a service
  • Desktop software moving online
  • User-generated examples, templates, and plug-ins
    • open-source galleries
    • paid galleries in industries that don't share data/products easily
  •  WebGL  is becoming more common (> SVG?)
  • More 3D in maps (mapbox, ArcGIS, googlemaps etc.)

General Software Trends

trend to create on pixel instead of line basis

methods written by other users or 3rd party for-profits NOT JUST PRIMARY SOFTWARE AUTHORS

Audience Question

What Are you excited about that is almost here?

Future:

 Machines get friendly, talk to one another, very in-bred

and know a lot more about how to do things

Excel

Code

Industry Specific

Desktop

GUIs

?

More functions stacked on top of more functions accessible to more people and with less fences to hop

 Changes Currently Pushing new tool adoption 

IT Architecture is changing

more data and increasingly complex data require different tools

data interpretations increasingly need to be shared & not only presented

Internet is faster & cloud is normalized

more competition more open-source, & prices are coming down

Infrastructure

New Tools

& New features

People

more people know how to code

new features might generate better understanding, faster understanding, or more people to be exposed to the information

more real-time / mobile expectations

Data

Task

data visualization being applied in new ways

Recent history of tools for generic data visualization tasks in large organizations:

 

Olden days..... tools were either...

  A. pre-installed on your operating system

  B. written for a specific company as part of single monolithic system

  C.  distributed via a physical storage medium (floppy, CD) that was purchased at a store or sent via mail.

Now a days......

A. Many people write data visualization code from scratch for a specific dataset using libraries.

B. Excel isn't the only game in town for generic data visualization GUI tools.

C. Cost has come down for industry specific tools.

D. Many start-ups are targeting the big guys in many industries with both general purpose and tools specific to visualization or data cleaning.

Questions to answer:

What are changes that enabled us to get from 1995 to 2016 in terms of data visualization tools?

Why do data visualization tools matter to large organizations?

What do newer data visualization tools and IT data models offer large organizations?

What might the future hold?

Vendor Buy-In

Does all your work go away once you stop paying for licenses?

 

What are the risks & termination costs?

Who needs what tool? Who decides? What if that judgement changes?

How do you convince people to only use licenses-dependent tools for things that are okay to vanish? Is that reasonable?

many for-profit tools only produce visualizations in proprietary formats

Free & Open-source

Free at a lower-level of performance

Free for limited time / data usuage

individual price

# of individuals price

enterprise size pricing

free or reduced for students or non-commercial 

What matters in cost is not just immediate cost but also:

   - structure of escalating costs

  - budget approval rights

  - long-term implications

Pricing Scheme

THis talk is not 

another 'big data' talk

 

well informed answers

 

1500s to now

 

a ranking of the "best" tools

 

how to do data viz

 

This talk is about

data sizes we all use

 

I'm still figuring it out

 

1983 - 2016+

 

why all these new tools?

 

what challenges do all these data visualization tools present to large organizations

Data_Visualization_Tools_Large_Organizations

By Justin Gosses

Data_Visualization_Tools_Large_Organizations

  • 3,015