http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
Relatively new to having a job title that sounds like I write code for a living
self-taught taught by internet
My large organization experience
Wanted an excuse to learn more about the topic
Curious to hear others' experience & Opinions
Oil & Gas
NASA
Justin Gosses
November 2016
https://github.com/JustinGOSSES/talk_HistDVTools
http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
Changing landscape of data visualization and analytical tools:
Quick Overview of 1976-2016
Data visualization in large organizations:
Data Transfer
Data Cleaning
Analytics
Analytics and other tasks sometimes get done by same software as data visualization
Everything done via code
Data Visualization
Analytics
Data Visualization
Data Transfer
Separate GUIs
code library
Tasks done by separate software & code
Data Transer
Data Cleaning
Analytics
Data Visualization
R inside GUI for these
GUI used for these
Single Software
GUI used for these
Data Cleaning
Excel
Code
Industry
Specific
Desktop
GUIs
mainframe
pre-installed application on personal computer
Business Intelligence, expensive applications, if not excel
some data-viz specific code libraries but not pure JS, start of web-based
pure JavaScript libraries, web default
Many of the data visualizations that load today in <1 second today would take 10s of seconds to download (and then additional time for your browser to display) years ago
(you can use this calculator to figure out how long your data visualizations would take to download in the past)
Nielsen's "Law" of Bandwidth
Edholm's "Law" of Bandwidth
both of these use top-of-the-line speeds at the time and show more or less the same thing
The last D3.js visualiation I made : 2 minutes in 1998, 5 seconds in 2003, <1 seconds in 2005
Browser ability is an influence on web-based data visualization tools
2000s
2000s
1. Everyone from 1996
2. A lot of people know a bit for work, often within another piece of software.
3. A bunch of students who were taught in a college class but aren't C.S. students.
4. code bootcamp students
5. Internet taught
These groups are growing fast!
Stack Overflow 2016 Developer Survey found =
1. People with C.S. degrees
2. Hackers
3. People making things in their garages
More people are doing more advanced data visualization, because more people know how to code
majority don't have a traditional C.S. degree
that are changing data visualization tools we use?
2016
Excel
Code
Industry Specific Desktop GUIs
Salesforce
Tableau
D3.js
chart.js
Venga
QlikView
Domo
cloud-based
platforms as a service
oil and gas data & analysis as online service
Spotfire
hundreds of BI options
bokeh
templates & add-ons purchased piecmeal
Altair
Microsoft BI
A MORE CROWDED LANDSCAPE WITH MORE HYBRIDS
FASTER TO BUILD THINGS & MORE WAYS TO VISUALIZE DATA
Speed of new things and diversity of things goes way up. Both open-source and $ license-based
Tableau
Petrel
D3.js
Spotfire
Tableau
Spotfire
D3.js
Ruths.ai template
faster understanding
more people impacted
better understanding
faster understanding
more people impacted
better understanding
a data driven organization
staff more easily spots trends and evaluates data
Minimize scrolling, need for memorization, or manual gathering of right data for comparison
data visualization easily reached on web, out of silos, auto-updated, and fast intuitive navigation
Financial
Marketing
Project Management
Science / Engineering
Logistics / Facilities
HR
Quick Glance Dashboards
story telling
means to shorten path to right data for that user
Enable understanding of complex data
Keep users' attention, so they use the data to make better decisions
Data Exploration
....a tool that includes certain data visualization types
...an interactive web-based app
...simple and fast GUI
...analytics and visualization combined
...something that costs nothing
....something with lots of other users and support for credibility to certain managers
...full control over design and function
....a tool that takes data from an API or a specific data format
your situation
another
situation
&
scenario:
standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization can see wider context
In additional to multiplie tools, sometimes there is cause to change tools
scenario:
standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization and can see wider context
key observations
costs
user skills
features
IT requirements to run
Does it play well with other tools
costs
user skills
features
IT requirements to run
Does it play well with other tools
Tool
characteristics
Org constraints
Who needs to sign off? Is internal process slow? What about long-term vendor buy-in
Will the people in charge of those other system let them play?
How does it compare to other options your organization already has?
Is your organization set up to teach those new skills or is that too much to ask?
Does getting these require funding? Does group doing data viz have access to getting them?
runs on web with server code
runs on web front-end only
runs on web with $ server license
only code
only GUI
GUI and code
GUI and only API code
Only on web as static image
Integrates easily with API data sources
Data munging included or not?
Integrates multiple languages
GUI and code for analytics part only
JS
R
Python
How long does it take to learn?
Ease of UX
# of clicks
What data formats can it ingest
Can it export visualization that is just html/CSS/JS?
vendor buy-in: If company goes away or license ends, does all previous work go up in smoke?
Need support
Does it integrate nicely with other software?
Can it create or update automatically?
# of data visualizations types built in
License costs
enterp. vs. indiv.
Does it play well with other tools
IT requirements to run
features
costs
users
Skill distribution within org
Where is data?
Is data clean?
How is data accessed?
local data
Standardized Data Analysis
Viewer is creator?
Viewer is <10 people
Viewer is large groups
non-standardized Data analysis
Data pre-cleaned
Data require cleaning
Data searching required many times
Data location standardized
data is from central location
API or plug-in data exchange
human data input
one off visualization
new data cycled into standard visualization often
result is fine local only
result on web
> man hours / viz
> eyeballs/viz
< eyeballs/viz
< man hours / viz
Purchase Scale
Communication
Training
Support
Policy
Vagueness
Culture
Security
Centralized vs. not organizational model
People already do it a certain way
It is likely the person you have to ask for approval was the person who approved the system you think needs replaced
If you're suggested automating something, that excess human labor is attached to a human
Pitching new data visualization as augmentation rather than replacement may have better success.
The person with approval rights might not be operating on same time span or priorities as you
Systems for data storage and transfer may not be built with your new tool in mind or even built with any tool <7 years old in mind
Data silos can sometimes have data trolls
Sometimes fake data can be used initially to show proof of concept and provide momentum with management
People are impacted
Data is power
Process may not be built with you in mind
Getting systems to play nice
https://github.com/JustinGOSSES/talk_HistDVTools
http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
dragons be to the right (slides not used)
that are changing data visualization tools we use?
General Software Trends
trend to create on pixel instead of line basis
methods written by other users or 3rd party for-profits NOT JUST PRIMARY SOFTWARE AUTHORS
and know a lot more about how to do things
Excel
Code
Industry Specific
Desktop
GUIs
IT Architecture is changing
more data and increasingly complex data require different tools
data interpretations increasingly need to be shared & not only presented
Internet is faster & cloud is normalized
more competition more open-source, & prices are coming down
Infrastructure
New Tools
& New features
People
more people know how to code
new features might generate better understanding, faster understanding, or more people to be exposed to the information
more real-time / mobile expectations
Data
Task
data visualization being applied in new ways
Recent history of tools for generic data visualization tasks in large organizations:
Olden days..... tools were either...
A. pre-installed on your operating system
B. written for a specific company as part of single monolithic system
C. distributed via a physical storage medium (floppy, CD) that was purchased at a store or sent via mail.
Now a days......
A. Many people write data visualization code from scratch for a specific dataset using libraries.
B. Excel isn't the only game in town for generic data visualization GUI tools.
C. Cost has come down for industry specific tools.
D. Many start-ups are targeting the big guys in many industries with both general purpose and tools specific to visualization or data cleaning.
What are changes that enabled us to get from 1995 to 2016 in terms of data visualization tools?
Why do data visualization tools matter to large organizations?
What do newer data visualization tools and IT data models offer large organizations?
What might the future hold?
Does all your work go away once you stop paying for licenses?
What are the risks & termination costs?
Who needs what tool? Who decides? What if that judgement changes?
How do you convince people to only use licenses-dependent tools for things that are okay to vanish? Is that reasonable?
many for-profit tools only produce visualizations in proprietary formats
Free & Open-source
Free at a lower-level of performance
Free for limited time / data usuage
individual price
# of individuals price
enterprise size pricing
free or reduced for students or non-commercial
What matters in cost is not just immediate cost but also:
- structure of escalating costs
- budget approval rights
- long-term implications
another 'big data' talk
well informed answers
1500s to now
a ranking of the "best" tools
how to do data viz
data sizes we all use
I'm still figuring it out
1983 - 2016+
why all these new tools?
what challenges do all these data visualization tools present to large organizations