A Blagger's Guide To Data
"Data literacy spans a broad set of technologies, skills and processes. It is the ability to read, work with, analyse and argue with data."
Unstructured
Structured



JSON

Tweet

Lists

RDF

Spreadsheets
Relational SQL Databases


HTML pages

Text documents

Folders


noSQL Databases


Unstructured data


UNIX 1969





Perl
Unstructured data
Files and Folders
Apple Lisa 1983



Perl



1950s - Steven Cole Kleene



1979 - Visicalc Invented by Dan Bricklin
Spreadsheets
Why Spreadsheets?
Fluid and adaptable
Grid-like, structured
Organised
Calculations
Aggregations, totals, averages, tax
Spreadsheets in 1 min
Headers
Clean data!
Validation (Lookup)
Formulas
Pivot tables
Common Spreadsheet Errors
vs
Originally a card-based index system
Visualised and used
...becomes...


DATABASES TO THE RESCUE?
1960s -> 1970s -> 1980s - Relational Databases


Why Databases?
Server
Schemas
Constraints
Queries
ACID. Transactions
DRY TABLES!
SQL in 1974
Indexing





Databases in 1 min
Tables
Ids
Each row has a unique Id/key
Foreign keys

SELECT image, owner, date
from pictures
WHERE owner = 'Tom'
ORDER BY
date DESC
limit 20
Why Not Databases?
People often find them difficult?
The Schema is (deliberately) restrictive
Your data is very complex / unstructured
Your data WILL become more complex
The Olden Days
Every single letter used mattered.
WELCOME TO THE FUTURE OF DATA
What's new?
Not much tbh
Better, faster, more reliable
Work with more data
Visualisation
Better tools
Online
Collaborative
They all "know about each other"

Surely there's better news than that?
It's all in the cloud now

NoSQL databases



Big Data

NoCode tools are becoming "a thing"
APIs
Application Programming Interface
JSON or XML or TEXT or CSV
More data than ever
is available


Graph Databases




Professional vs Beginner
Visualisations

Google Data Studio
You can work with Google Data Studio too (if your data is clean)

Even more visualisation tools here
The Coding Elephant in the room
What area? Swiss-army knife vs statistical vs fun vs web vs speed vs £££ vs reliability vs specialised ?

Python, Javascript
Core data skills
- Organise and Design Concepts
- Re-shape and clean data
...then...
- Interrogate or integrate data (i.e use)
UML
Notation for databases, business processes, services etc

OpenRefine

Orange3
Thank you
Questions?
A Blagger's Guide To Data
By Tom Smith
A Blagger's Guide To Data
- 70