A Blagger's Guide To Data

"Data literacy spans a broad set of technologies, skills and processes. It is the ability to read, work with, analyse and argue with data."

Unstructured

Structured

JSON

Tweet

Lists

RDF

Spreadsheets

Relational SQL Databases

HTML pages

Text documents

Folders

Email

noSQL Databases

Unstructured data

UNIX 1969

Perl

Unstructured data

Files and Folders

Apple Lisa 1983

Perl

1950s - Steven Cole Kleene

1979 - Visicalc Invented by Dan Bricklin

Spreadsheets

Why Spreadsheets?

Fluid and adaptable

Grid-like, structured

Organised

Calculations

Aggregations, totals, averages, tax

 

Spreadsheets in 1 min

Headers

Clean data!

Validation (Lookup)

Formulas

Pivot tables

Common Spreadsheet Errors

vs

Originally a card-based index system

Visualised and used

...becomes...

DATABASES TO THE RESCUE?

1960s -> 1970s -> 1980s - Relational Databases

Why Databases?

Server

Schemas

Constraints

Queries

ACID. Transactions

DRY TABLES! 

SQL in 1974

Indexing

Databases in 1 min

Tables

Ids

Each row has a unique Id/key

Foreign keys

SELECT image, owner, date
from pictures
WHERE owner = 'Tom'
ORDER BY
date DESC
limit 20

Why Not Databases?

People often find them difficult?

The Schema is (deliberately) restrictive

Your data is very complex / unstructured

Your data WILL become more complex

 

The Olden Days

Every single letter used mattered.

 

WELCOME TO THE FUTURE OF DATA

What's new?

Not much tbh

Better, faster, more reliable

Work with more data

Visualisation

Better tools

Online

Collaborative

They all "know about each other"

Surely there's better news than that?

It's all in the cloud now

NoSQL databases

Big Data

NoCode tools are becoming "a thing"

APIs

Application Programming Interface

JSON or XML or TEXT or CSV

More data than ever

is available

Graph Databases

Professional vs Beginner

Visualisations

Google Data Studio

You can work with Google Data Studio too (if your data is clean)

The Coding Elephant in the room 

What area?  Swiss-army knife vs statistical vs fun vs web vs speed vs £££ vs reliability vs specialised ?

Python, Javascript

Core data skills

  • Organise and Design Concepts
  • Re-shape and clean data

...then...

  • Interrogate or integrate data (i.e use)

 

 

UML

Notation for databases, business processes, services etc

OpenRefine

Orange3

Thank you

Questions?

A Blagger's Guide To Data

By Tom Smith

A Blagger's Guide To Data

  • 44