The Database
--; DROP TABLE swsUser
But first what is it?
The source of truth for all company data
The main storage of all user data
Required by pretty much every service SWS has
It's kind of a big deal
The Database through time
What is this? A database for ants?
CIQ Data
Database server
Rackspace
We heard you like credits
Lets put it all on one big server lol
Things are getting slow...
Lets make lots of little baby servers
Why is everything still going down?
Hello darkness my old friend
CIQ we need to break-up
We started with SWS data and CIQ data living together
Meant that the database size combined was 1.9TB
The total disk space was 2.0TB
Daily chore for me was to clear space before we hit 0 bytes free
Backing up SWS data required hacky methods
What a database shouldn't do:
Be at constant high CPU usage
Have <50GB free of disk space
Be almost impossible to back up
Also have an uptime like this
What we need from the database:
High availability
High performance
Scalable
Frequent & reliable backups
SQL availability groups
Three or more SQL servers sharing the same databases
Consists of one Primary and two+ secondaries
Persisting data happens on the Primary which then synchronises the secondaries
Reads optionally go to the Secondaries for read-heavy apps like the Batch
SQL Availability Groups
How is this better than an even bigger server?
One server is one big point of failure
Maintenance requires outages
There's a cap on database performance
Disaster recovery requires days potentially
A Failure scenario
Primary node blows up
Secondaries decide on who becomes the new Primary
New primary takes over the primary IPs
Apps reconnect after short downtime (60-90s)
Old Primary is fixed, comes back online
Rejoins the group
Becomes a secondary and starts synchronising
Availability Group returns to Healthy
Backups
Now we can do them properly
Restoring to staging is a lot easier
Backups are stored at 15 minute resolution
Live demo
This could go terribly wrong
Made with Slides.com