High performance MySQL (draft)

By James Harvey

Database performance is an issue that creeps up on you

Early-stage startup problems

- Finding customers

- Rapid feature development

- Iterating to find product-market fit

- Fixing bugs and bad product decisions

Scaling startup problems

- All that stuff from before

- Heaps of customers

- A bunch of really big customers

- Even bigger customers coming down the pipeline

- Your database servers go to 100% CPU randomly

- You start getting weird errors from the database

How did this emerge for us?

Errors, slow queries and app performance

And helpful informative bug reports

Okay let's take a step back

You will make mistakes in your database design

You will think you know how MySQL will behave at scale and you will be wrong

It is a good thing when you start hitting scaling issues in your app because it means people are actually using it!

Back to basics

Optimization vs scaling
Database design
How indexes work
How transactions work
F***ing deadlocks
Explain

Optimization

Making maximum use of each MySQL node

Fast queries
Quick transactions
No f***ing deadlocks
Avoiding N+1 queries

Scaling

Adding more MySQL nodes

Adding read replicas
Splitting out databases across clusters
Sharding

Database design - Integers

Column	Bytes	Signed	Unsigned
TINYINT	1	-128 to 128	0 to 256
SMALLINT	2	-32,768 to 32,767	0 to 65,535
MEDIUMINT	3	-8,388,608 to 8,388,607	0 to 16,777,215
INT	4	-2,147,483,648 to 2,147,483,647	0 to 4,294,967,295
BIGINT	8	REALLY BIG	REALLY BIG

Unsign if you don't need negative numbers
Don't use char/var_char for numbers if you can use an integer
Pick the smallest size that makes sense but give yourself some headroom if you think you need it
Int(11) is for MySQL tooling, it does not affect storage or performance

Database design - Floats and Decimal

TODO

Database design -
Varchar vs Char

TODO

Database design -
Text and Blobs

TODO

Database design -
Bits and Flags

TODO

Database design -
Ideas to try

Smaller is usually better
Move text fields to alternative storage (eg. AWS S3)
JSON storage is likely an anti-pattern
Move out optional data into sub tables to reduce storage waste
Checksum data as a key
Keep the maximum joins in queries under ~15

Indexes and Keys

Clustered Indexes

A clustered index is how the data is stored on disk
MySQL/InnoDB uses clustered indexes for primary keys
If a primary key does not exist, MySQL will create one behind the scenes
Never make a primary key nullable
InnoDB uses a B+ Tree not a B Tree
It is best to use ordered data as PK

Disk Blocks

Disk Blocks are the Leaf Nodes of the B+ Tree
Data records are stored in any order
A small list of pointers are maintained
If an update is too big, a new record is added and the pointer is updated
Storage is mostly handled by the Operating System

Secondary Indexes

Secondary indexes reference the primary index
If the data exists in the secondary index MySQL doesn't need to fetch data from the data block
The bigger the primary key, the bigger the index size
Data modifications will update the indexes as well

Multiple Secondary Indexes

Index fields are concatenated together to form the index value
Order of the indexed fields is important
Part of the index can be used depending on the order

MySQL Explain

Field	Meaning
type	Worst to best - ALL, index, range, ref, eq_ref, const, system
possible keys	The keys that could be used
key	The key that was used
rows	Number of rows that had to scanned
extra	More info

Query optimisation tips

Select only the data you need. If it's all in the index, MySQL does not need to go to the Disk Block
Avoid function calls in queries on indexed columns. Eg. use date_field > 'date time'
instead of date(date_field) > 'date'
When performing joins, identify the best entry table that filters the most data
Be considered when adding indexes, they are easy to add and hard to take away
Continuously check the slow query log for new queries that have become problems

Transactions and Deadlocks

Used to ensure a change completely works or completely fails
All inserts, updates and deletes use transactions
InnoDB has a solid transaction system
Table locking and row locking

ACID compliance

Atomicity - A single logical unit of work
Consistency - DB moves from one consistent state to the next
Isolation - The results of a transaction are usually invisible to other transactions until the transaction is complete
Durability - Once committed, a transaction’s changes are permanent

Isolation Levels

Read uncommitted - Bad
Read committed - Non-repeatable read
Repeatable Read - Can have phantom reads. (InnoDB uses this but tries to avoid phantom reads)
Serializable - The highest level of isolation, locks reads though

How transactions work

TODO

Lock timeouts

A transaction was held open too long

F***ing deadlocks

Two transactions competed for changes where InnoDB cannot resolve them both

Tips

Keep transactions as fast as possible
Ensure update/delete queries are optimized
Don't mix storage engines and transactions

N+1 Queries

TODO

References

An in-depth look at Database indexing

High-Performance MySQL 3rd & 4th Edition

Storing UUID and Generated Columns

(Draftt) High performance MySQL

By James Harvey

(Draftt) High performance MySQL

High performance MySQL (draft)

Database performance is an issue that creeps up on you

How did this emerge for us?

Errors, slow queries and app performance

And helpful informative bug reports

Okay let's take a step back

Back to basics

Optimization

Scaling

Database design - Integers

Database design - Floats and Decimal

Database design - Varchar vs Char

Database design - Text and Blobs

Database design - Bits and Flags

Database design - Ideas to try

Indexes and Keys

Clustered Indexes

Disk Blocks

Secondary Indexes

Secondary Indexes

Multiple Secondary Indexes

MySQL Explain

Query optimisation tips

Transactions and Deadlocks

ACID compliance

Isolation Levels

How transactions work

Lock timeouts

F***ing deadlocks

Tips

N+1 Queries

References

(Draftt) High performance MySQL

More from James Harvey

Database design -
Varchar vs Char

Database design -
Text and Blobs

Database design -
Bits and Flags

Database design -
Ideas to try