High performance MySQL (draft)

By James Harvey

Database performance is an issue that creeps up on you

Early-stage startup problems

- Finding customers

- Rapid feature development

- Iterating to find product-market fit

- Fixing bugs and bad product decisions

 

Scaling startup problems

- All that stuff from before

- Heaps of customers

- A bunch of really big customers

- Even bigger customers coming down the pipeline

- Your database servers go to 100% CPU randomly

- You start getting weird errors from the database

 

How did this emerge for us?

Errors, slow queries and app performance

And helpful informative bug reports

Okay let's take a step back

You will make mistakes in your database design

You will think you know how MySQL will behave at scale and you will be wrong

It is a good thing when you start hitting scaling issues in your app because it means people are actually using it!

Back to basics

  • Optimization vs scaling
  • Database design
  • How indexes work
  • How transactions work
  • F***ing deadlocks
  • Explain

Optimization

Making maximum use of each MySQL node

  • Fast queries
  • Quick transactions
  • No f***ing deadlocks
  • Avoiding N+1 queries

Scaling

Adding more MySQL nodes
 

  • Adding read replicas
  • Splitting out databases across clusters
  • Sharding

Database design - Integers

Column Bytes Signed Unsigned
TINYINT 1 -128 to 128 0 to 256
SMALLINT 2 -32,768 to 32,767 0 to 65,535
MEDIUMINT 3 -8,388,608 to 8,388,607 0 to 16,777,215
INT 4 -2,147,483,648 to 2,147,483,647 0 to 4,294,967,295
BIGINT 8 REALLY BIG REALLY BIG
  • Unsign if you don't need negative numbers
  • Don't use char/var_char for numbers if you can use an integer
  • Pick the smallest size that makes sense but give yourself some headroom if you think you need it
  • Int(11) is for MySQL tooling, it does not affect storage or performance

Database design - Floats and Decimal

TODO

Database design -
Varchar vs Char

TODO

Database design -
Text and Blobs

TODO

Database design -
Bits and Flags

TODO

Database design -
Ideas to try

  • Smaller is usually better
  • Move text fields to alternative storage (eg. AWS S3)
  • JSON storage is likely an anti-pattern
  • Move out optional data into sub tables to reduce storage waste
  • Checksum data as a key
  • Keep the maximum joins in queries under ~15

Indexes and Keys

Clustered Indexes

  • A clustered index is how the data is stored on disk
  • MySQL/InnoDB uses clustered indexes for primary keys
  • If a primary key does not exist, MySQL will create one behind the scenes
  • Never make a primary key nullable
  • InnoDB uses a B+ Tree not a B Tree
  • It is best to use ordered data as PK

Disk Blocks

  • Disk Blocks are the Leaf Nodes of the B+ Tree
  • Data records are stored in any order
  • A small list of pointers are maintained
  • If an update is too big, a new record is added and the pointer is updated
  • Storage is mostly handled by the Operating System

Secondary Indexes

Secondary Indexes

  • Secondary indexes reference the primary index
  • If the data exists in the secondary index MySQL doesn't need to fetch data from the data block
  • The bigger the primary key, the bigger the index size
  • Data modifications will update the indexes as well

Multiple Secondary Indexes

  • Index fields are concatenated together to form the index value
  • Order of the indexed fields is important
  • Part of the index can be used depending on the order

MySQL Explain

Field Meaning
type Worst to best - ALL, index, range, ref, eq_ref, const, system
possible keys The keys that could be used
key The key that was used
rows Number of rows that had to scanned
extra More info

Query optimisation tips

  • Select only the data you need. If it's all in the index, MySQL does not need to go to the Disk Block
  • Avoid function calls in queries on indexed columns. Eg. use          date_field > 'date time'
    instead of     date(date_field) > 'date'
  • When performing joins, identify the best entry table that filters the most data
  • Be considered when adding indexes, they are easy to add and hard to take away
  • Continuously check the slow query log for new queries that have become problems

Transactions and Deadlocks

  • Used to ensure a change completely works or completely fails
  • All inserts, updates and deletes use transactions
  • InnoDB has a solid transaction system
  • Table locking and row locking

ACID compliance

  • Atomicity - A single logical unit of work

  • Consistency - DB moves from one consistent state to the next

  • Isolation - The results of a transaction are usually invisible to other transactions until the transaction is complete

  • Durability - Once committed, a transaction’s changes are permanent

Isolation Levels

  • Read uncommitted - Bad
  • Read committed - Non-repeatable read
  • Repeatable Read - Can have phantom reads. (InnoDB uses this but tries to avoid phantom reads)
  • Serializable - The highest level of isolation, locks reads though

How transactions work

  • TODO

Lock timeouts

  • A transaction was held open too long

F***ing deadlocks

  • Two transactions competed for changes where InnoDB cannot resolve them both

Tips

  • Keep transactions as fast as possible
  • Ensure update/delete queries are optimized
  • Don't mix storage engines and transactions

N+1 Queries

  • TODO

References

(Draftt) High performance MySQL

By James Harvey

(Draftt) High performance MySQL

  • 310