The Database
--; DROP TABLE swsUser
But first what is it?
- The source of truth for all company data
- The main storage of all user data
- Required by pretty much every service SWS has
- It's kind of a big deal
The Database through time
What is this? A database for ants?
CIQ Data
Database server
Rackspace
We heard you like credits
Lets put it all on one big server lol
Things are getting slow...
Lets make lots of little baby servers
Why is everything still going down?
Hello darkness my old friend
CIQ we need to break-up
- We started with SWS data and CIQ data living together
- Meant that the database size combined was 1.9TB
- The total disk space was 2.0TB
- Daily chore for me was to clear space before we hit 0 bytes free
- Backing up SWS data required hacky methods
What a database shouldn't do:
- Be at constant high CPU usage
- Have <50GB free of disk space
- Be almost impossible to back up
- Also have an uptime like this
What we need from the database:
- High availability
- High performance
- Scalable
- Frequent & reliable backups
SQL availability groups
- Three or more SQL servers sharing the same databases
- Consists of one Primary and two+ secondaries
- Persisting data happens on the Primary which then synchronises the secondaries
- Reads optionally go to the Secondaries for read-heavy apps like the Batch
SQL Availability Groups
How is this better than an even bigger server?
- One server is one big point of failure
- Maintenance requires outages
- There's a cap on database performance
- Disaster recovery requires days potentially
A Failure scenario
- Primary node blows up
- Secondaries decide on who becomes the new Primary
- New primary takes over the primary IPs
- Apps reconnect after short downtime (60-90s)
- Old Primary is fixed, comes back online
- Rejoins the group
- Becomes a secondary and starts synchronising
- Availability Group returns to Healthy
Backups
- Now we can do them properly
- Restoring to staging is a lot easier
- Backups are stored at 15 minute resolution
Live demo
This could go terribly wrong
SWL Availability Group
By Jabin Bastian
SWL Availability Group
- 190