The changing landscape of Data visualization tools in Large Organizations
http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
Context
MY PERSPECTIVE
Relatively new to having a job title that sounds like I write code for a living
self-taught taught by internet
My large organization experience
REASONS FOR TALK
Wanted an excuse to learn more about the topic
Curious to hear others' experience & Opinions
Oil & Gas
NASA
Justin Gosses
November 2016
https://github.com/JustinGOSSES/talk_HistDVTools
http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
What is this talk about?
Changing landscape of data visualization and analytical tools:
Quick Overview of 1976-2016
Data visualization in large organizations:
- value to org
- deciding what tools to use?
- reasons new tools might be adopted
Raise your hand if
you've spent time searching for the right visualization tool for yourself
Audience Question:
Raise your hand if
you've partcipated in selecting a data visualization tool for your organization
Audience Question:
Raise your hand if
your organization told you what tool to use for a data visualization task
Audience Question:
Audience Question:
Raise your hand if
you work in an organization of more than 500 people
Data Transfer
Data Cleaning
Analytics
Analytics and other tasks sometimes get done by same software as data visualization
Everything done via code
Data Visualization
Analytics
Data Visualization
Data Transfer
Separate GUIs
code library
Tasks done by separate software & code
Hard to talk about just data visualization tools
Data Transer
Data Cleaning
Analytics
Data Visualization
R inside GUI for these
GUI used for these
Single Software
GUI used for these
Data Cleaning
Changing Tool landscape
What drove changes?
1976-2016
Past Landscape
Small Data Analytics in Large Organizations Dominated By 3 types of tools
"th
Originally, Not Much Between The Islands
Excel
Code
Industry
Specific
Desktop
GUIs
WHERE & HOW DATA VISUALIZATION HAPPENS
mainframe
pre-installed application on personal computer
Business Intelligence, expensive applications, if not excel
some data-viz specific code libraries but not pure JS, start of web-based
pure JavaScript libraries, web default
Spreadsheets & Charts
1976-1993 Timeline
- Spreadsheet analytics as way to sell computers
- Pattern of tool development in small companies, then bought by larger companies to be provided to user for free with OS
- Charting secondary to calculation
Early internet doldrums
1994-2006 Timeline
- Data Limit Implications
- Data often stored on local machines or on-site servers
- Data visualization too big for puny internet
-
Software not pre-installed must be physically delivered
- require distribution, marketing, big scale or $ mark-up
- Shared via print-outs and presentations
- Early visualization libraries but not pure-JavaScript
download speeds were an early constraint
Many of the data visualizations that load today in <1 second today would take 10s of seconds to download (and then additional time for your browser to display) years ago
(you can use this calculator to figure out how long your data visualizations would take to download in the past)
Nielsen's "Law" of Bandwidth
Edholm's "Law" of Bandwidth
both of these use top-of-the-line speeds at the time and show more or less the same thing
The last D3.js visualiation I made : 2 minutes in 1998, 5 seconds in 2003, <1 seconds in 2005
Evolution of the web
a google chrome experiment from 2012
Browser ability is an influence on web-based data visualization tools
Data on WEb is Easier
2006-2016 Timeline
- Browsers get more powerful
- Important web standards
- Internet speeds increase
- Many JavaScript libraries for data visualization appear
-
Open-source increases rate of development
- Pattern of finding ways to maximize use of web standards after multile iterations
- More things with more perspectives
- Universities continue to house pro-longed early development
- More data = more need for data visualization tools
2000s
2000s
Raise your hand if
You have a university degree in computer science or similar field
Audience Question:
1996
2016
1. Everyone from 1996
2. A lot of people know a bit for work, often within another piece of software.
3. A bunch of students who were taught in a college class but aren't C.S. students.
4. code bootcamp students
5. Internet taught
Who writes code?
These groups are growing fast!
Stack Overflow 2016 Developer Survey found =
1. People with C.S. degrees
2. Hackers
3. People making things in their garages
More people are doing more advanced data visualization, because more people know how to code
majority don't have a traditional C.S. degree
What are recent trends
that are changing data visualization tools we use?
2016
~Arm waving~
Present Landscape
More Options
Excel
Code
Industry Specific Desktop GUIs
Salesforce
Tableau
D3.js
chart.js
Venga
QlikView
Domo
cloud-based
platforms as a service
oil and gas data & analysis as online service
Spotfire
hundreds of BI options
Cost pressure?
More libraries
More GUIs have code as option
More Industry specific GUIs are being built with plug-in capability
bokeh
templates & add-ons purchased piecmeal
analytics software have r and python as default instead of vba
Altair
Microsoft BI
A MORE CROWDED LANDSCAPE WITH MORE HYBRIDS
FASTER TO BUILD THINGS & MORE WAYS TO VISUALIZE DATA
open-source making its way into industry-specific software more
WEbGL > SVG
More Hybrids
assumptions of better data flow across systems
Writing code is faster
Galleries Save Time
User or 3rd party generated examples extensions, templates, and plug-ins
Instead of standing on the shoulders of long ago giants, you stand on the shoulders of anyone doing similar work, somewhere, right now.
Speed of new things and diversity of things goes way up. Both open-source and $ license-based
Tableau
Petrel
D3.js
Spotfire
Tableau
Spotfire
D3.js
Ruths.ai template
Why does data visualization matter in a large organizaton?
What is the value to organizations of better & more data visualization?
faster understanding
more people impacted
better understanding
data visualization
better decisions
What is the value to organizations of better & more data visualization?
faster understanding
more people impacted
better understanding
data visualization
better decisions
a data driven organization
staff more easily spots trends and evaluates data
Minimize scrolling, need for memorization, or manual gathering of right data for comparison
data visualization easily reached on web, out of silos, auto-updated, and fast intuitive navigation
What Data ?
in large organizations?
Financial
Marketing
Project Management
Science / Engineering
Logistics / Facilities
HR
For What Purpose ?
in large organizations?
Quick Glance Dashboards
story telling
means to shorten path to right data for that user
Enable understanding of complex data
Keep users' attention, so they use the data to make better decisions
Data Exploration
Different Tools for different situations
....a tool that includes certain data visualization types
...an interactive web-based app
...simple and fast GUI
...analytics and visualization combined
...something that costs nothing
....something with lots of other users and support for credibility to certain managers
...full control over design and function
....a tool that takes data from an API or a specific data format
your situation
another
situation
better tools when appropriate
&
changing to a more modern Data visualization method
A theoretical example
Excel
d3.js
scenario:
standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization can see wider context
In additional to multiplie tools, sometimes there is cause to change tools
Ideally
- Easily found by anyone on intranet
- Intuitive and easy to navigate to the small portion of the whole that concerns a user
- accurate
- updated frequently if not real-time
- easy to compare parts of the data
- Sustainable to maintain long-term
- low cost
Want to avoid
scenario:
standardized high-level financial data that should be shared widely internally so that managers in different parts of the organization and can see wider context
- Visualization is hard to reach or find.
- Large amounts of scrolling or clicking is needed to find subset of data.
- Data is inaccurate
- Data is out of date or of different vintages
- Printing or memorization is required to compare data subsets.
- dependent on skill choke-points
Excel
D3.js
- build once
- data input updates without rebuild
- control every detail & looks sharp
- no license cost, Open Source
- can easily to be tied into data API
- Easy for 10,000 or only 1 person to access
Excel
D3.js
- takes longer to build 1st time
- requires someone who can write JavaScript and has permissions
- requires developer (&subject expert?)
- Ideally, not changed frequently
- everyone can build easily
- familiar
- support of large company that won't go away
- works with standard data types for tabular data
- need to re-make as data changes
- large amount of manual labor
- not built for data API ingest
- not real-time
- hard to distribute widely or online
PRO
Con
key observations
Diversity of Data Visualization Tools Tool characteristics that affect adoption
costs
user skills
features
IT requirements to run
Does it play well with other tools
These can be thought of as both tool characteristics and possible organizational constraints
costs
user skills
features
IT requirements to run
Does it play well with other tools
Tool
characteristics
Org constraints
Who needs to sign off? Is internal process slow? What about long-term vendor buy-in
Will the people in charge of those other system let them play?
How does it compare to other options your organization already has?
Is your organization set up to teach those new skills or is that too much to ask?
Does getting these require funding? Does group doing data viz have access to getting them?
now imagine this as a very long form & multi-step process
runs on web with server code
runs on web front-end only
runs on web with $ server license
only code
only GUI
GUI and code
GUI and only API code
Only on web as static image
Integrates easily with API data sources
Data munging included or not?
Integrates multiple languages
GUI and code for analytics part only
JS
R
Python
How long does it take to learn?
Ease of UX
# of clicks
What data formats can it ingest
Can it export visualization that is just html/CSS/JS?
vendor buy-in: If company goes away or license ends, does all previous work go up in smoke?
Need support
Does it integrate nicely with other software?
Can it create or update automatically?
# of data visualizations types built in
License costs
enterp. vs. indiv.
Does it play well with other tools
IT requirements to run
features
costs
users
Skill distribution within org
Where is data?
Is data clean?
How is data accessed?
How well does your data flow?
Changing these in large, old organizations is hard
but improves productivity as..
local data
Standardized Data Analysis
Viewer is creator?
Viewer is <10 people
Viewer is large groups
non-standardized Data analysis
Data pre-cleaned
Data require cleaning
Data searching required many times
Data location standardized
data is from central location
API or plug-in data exchange
human data input
one off visualization
new data cycled into standard visualization often
result is fine local only
result on web
> man hours / viz
> eyeballs/viz
< eyeballs/viz
< man hours / viz
more people impacted for less work
Ways organizations affect adoption of data viz tools
Purchase Scale
Communication
Training
Support
Policy
Vagueness
Culture
Security
Centralized vs. not organizational model
Considerations,
when trying to push New tool adoption in a large organization
People already do it a certain way
It is likely the person you have to ask for approval was the person who approved the system you think needs replaced
If you're suggested automating something, that excess human labor is attached to a human
Pitching new data visualization as augmentation rather than replacement may have better success.
The person with approval rights might not be operating on same time span or priorities as you
Systems for data storage and transfer may not be built with your new tool in mind or even built with any tool <7 years old in mind
Data silos can sometimes have data trolls
Sometimes fake data can be used initially to show proof of concept and provide momentum with management
People are impacted
Data is power
Process may not be built with you in mind
Getting systems to play nice
Questions?
https://github.com/JustinGOSSES/talk_HistDVTools
http://slides.com/justingosses/DataVizToolsHistoryLargeOrg
backup slides:
dragons be to the right (slides not used)
near-future (0-5yr) trends
that are changing data visualization tools we use?
~Even more Arm waving~
Future trends ?
- VR
- More 3D
- more latency, due to two things above....
-
AI in data prep & chart style selection
- the return of clippy? but less annoying?
- Continued focus on minimizing data prep through data architecture
-
Even more blending of BI & Data Science & IT?
- through better BI (Tableau that does everything)
- Or easier flow between different components?
- More people do data visualization as a "part" of their job
-
More definition for what a 100% data visualization person does?
- more "data visualization" jobs on linkedIN right now than a year ago
- Less grunt work (due to better data engineering, better data prep tools)
- APIs that talk to APIs that talk to APIs (tools, IoT, storage, code, GUIs, etc.)
Recent & near Future Trends
- Component over monolithic architecture
- Platform as a service, cloud
- Data, analysis, visualization, etc. as a service
- Desktop software moving online
-
User-generated examples, templates, and plug-ins
- open-source galleries
- paid galleries in industries that don't share data/products easily
- WebGL is becoming more common (> SVG?)
- More 3D in maps (mapbox, ArcGIS, googlemaps etc.)
General Software Trends
trend to create on pixel instead of line basis
methods written by other users or 3rd party for-profits NOT JUST PRIMARY SOFTWARE AUTHORS
Audience Question
What Are you excited about that is almost here?
Future:
Machines get friendly, talk to one another, very in-bred
and know a lot more about how to do things
Excel
Code
Industry Specific
Desktop
GUIs
?
More functions stacked on top of more functions accessible to more people and with less fences to hop
Changes Currently Pushing new tool adoption
IT Architecture is changing
more data and increasingly complex data require different tools
data interpretations increasingly need to be shared & not only presented
Internet is faster & cloud is normalized
more competition more open-source, & prices are coming down
Infrastructure
New Tools
& New features
People
more people know how to code
new features might generate better understanding, faster understanding, or more people to be exposed to the information
more real-time / mobile expectations
Data
Task
data visualization being applied in new ways
Recent history of tools for generic data visualization tasks in large organizations:
Olden days..... tools were either...
A. pre-installed on your operating system
B. written for a specific company as part of single monolithic system
C. distributed via a physical storage medium (floppy, CD) that was purchased at a store or sent via mail.
Now a days......
A. Many people write data visualization code from scratch for a specific dataset using libraries.
B. Excel isn't the only game in town for generic data visualization GUI tools.
C. Cost has come down for industry specific tools.
D. Many start-ups are targeting the big guys in many industries with both general purpose and tools specific to visualization or data cleaning.
Questions to answer:
What are changes that enabled us to get from 1995 to 2016 in terms of data visualization tools?
Why do data visualization tools matter to large organizations?
What do newer data visualization tools and IT data models offer large organizations?
What might the future hold?
Vendor Buy-In
Does all your work go away once you stop paying for licenses?
What are the risks & termination costs?
Who needs what tool? Who decides? What if that judgement changes?
How do you convince people to only use licenses-dependent tools for things that are okay to vanish? Is that reasonable?
many for-profit tools only produce visualizations in proprietary formats
Free & Open-source
Free at a lower-level of performance
Free for limited time / data usuage
individual price
# of individuals price
enterprise size pricing
free or reduced for students or non-commercial
What matters in cost is not just immediate cost but also:
- structure of escalating costs
- budget approval rights
- long-term implications
Pricing Scheme
THis talk is not
another 'big data' talk
well informed answers
1500s to now
a ranking of the "best" tools
how to do data viz
This talk is about
data sizes we all use
I'm still figuring it out
1983 - 2016+
why all these new tools?
what challenges do all these data visualization tools present to large organizations