Outline
- Goals
- Solutions available
- How and Why we chose the solution we did?
- Overview of Redis Sentinel
- Customized tunings for partition tolerance
Old Redis HA Setup
- We use master-slave replication for HA.
- Manual failover for redis master, if master goes down.
Problem we wanted to solve
- Automated failover of redis master (Redis master HA)
- Partition tolerance
Things to keep in mind for Redis HA
- Redis async replication - No strong consistency
- Automated switching is not built-in
Solutions explored
- Vanilla Redis cluster
- Vanilla Redis sentinel
- Haproxy + Smitty + Twemproxy + Redis Sentinel
- Redis sentinel with haproxy
Vanilla Redis cluster
- Supports automatic failover with builtin sharding
- Builtin redis support
- But requires client to be smart.
Vanilla Redis sentinel
- Supports automatic failover
- An external monitoring service to monitor redis state
- But requires client to be smart.
Haproxy + Twemproxy + Smitty + Redis Sentinel
- Too many moving parts
Redis Sentinel with haproxy
- Monitor service with redis sentinel
- Switch master/slaves in haproxy by monitoring sentinel,
- No client code changes required
A closer look at Redis Sentinel Service
- Monitoring service, utilizes quorum
- Triggers election if quorum subjectively decides that masters are down
A closer look at Redis Sentinel Service
- In failover scenario, a sentinel leader is selected to promote a slave to master
- Failover can't happen in minority partition, because majority is required for leader selection
But how would clients detect master failover?
- Bring haproxy to the mix
Haproxy master external check script
- Check state from sentinel
- Sentinel is in quorum?
- Who is the current master?
- Is it tcp reachable?
- If all checks pass, mark the node up
What we achived with this setup
- Clients can be dumb. All HA logic resides with haproxy.
- No code changes required. Any project in any language can use this setup.
Do we have partition tolerance?
- In minority partition, haproxy will mark master down as sentinel won't be in quorum
- There is still a small window where two redis master can be up, during failover
- We can reduce that window by tuning configs of various services involved
Haproxy tunables
Parameter | Value | Description |
---|---|---|
external-check command | <check-script> | external script used to perform check |
inter | 1s | Frequency of performing checks |
rise | 7 | Haproxy will mark a node up after 7 consecutive check pass |
fall | 3 | Haproxy will mark a node down after 3 consecutive check fail |
Redis tunables
Parameter | Value | Description |
---|---|---|
min_slaves_to_write | 1 | Redis will stop accepting writes if `min-slaves-to-write` number of |
min-slaves-max-lag | 10 | slaves are lagging by `min-slaves-max-lag` seconds. |
Redis sentinel tunables
Parameter | Value | Description |
---|---|---|
down-after-milliseconds | 5000 | Mark objectively down if server is seen down |
min-quorum | 2 | Redis will mark a server subjectively down if min-quorum sentinel services notice it down |
failover-timeout | 180000 | No new failover will trigger be triggered before failover timeout |
parallel_syncs | 1 | Number of redis salves can parallelly sync |
Key takeaways
- You can achieve reasonably consistent, partition tolerant redis cluster with Haproxy + Sentinel
- Haproxy is a versatile piece of software, it's a devops swiss army knife.
Questions?
Partition Tolerant Redis cluster with haproxy
By Ayush Goyal
Partition Tolerant Redis cluster with haproxy
- 1,843