Partition tolerant, Highly available redis cluster using haproxy

By Ayush Goyal

ayush@helpshift.com

Outline

Goals
Solutions available
How and Why we chose the solution we did?
Overview of Redis Sentinel
Customized tunings for partition tolerance

Old Redis HA Setup

We use master-slave replication for HA.
Manual failover for redis master, if master goes down.

Problem we wanted to solve

Automated failover of redis master (Redis master HA)
Partition tolerance

Things to keep in mind for Redis HA

Redis async replication - No strong consistency
Automated switching is not built-in

Solutions explored

Vanilla Redis cluster
Vanilla Redis sentinel
Haproxy + Smitty + Twemproxy + Redis Sentinel
Redis sentinel with haproxy

Vanilla Redis cluster

Supports automatic failover with builtin sharding
Builtin redis support
But requires client to be smart.

Vanilla Redis sentinel

Supports automatic failover
An external monitoring service to monitor redis state
But requires client to be smart.

Haproxy + Twemproxy + Smitty + Redis Sentinel

Too many moving parts

Redis Sentinel with haproxy

Monitor service with redis sentinel
Switch master/slaves in haproxy by monitoring sentinel,
No client code changes required

A closer look at Redis Sentinel Service

Monitoring service, utilizes quorum
Triggers election if quorum subjectively decides that masters are down

A closer look at Redis Sentinel Service

In failover scenario, a sentinel leader is selected to promote a slave to master
Failover can't happen in minority partition, because majority is required for leader selection

But how would clients detect master failover?

Bring haproxy to the mix

Haproxy master external check script

Check state from sentinel
Sentinel is in quorum?
Who is the current master?
Is it tcp reachable?
If all checks pass, mark the node up

What we achived with this setup

Clients can be dumb. All HA logic resides with haproxy.
No code changes required. Any project in any language can use this setup.

Do we have partition tolerance?

In minority partition, haproxy will mark master down as sentinel won't be in quorum
There is still a small window where two redis master can be up, during failover
We can reduce that window by tuning configs of various services involved

Haproxy tunables

Parameter	Value	Description
external-check command	<check-script>	external script used to perform check
inter	1s	Frequency of performing checks
rise	7	Haproxy will mark a node up after 7 consecutive check pass
fall	3	Haproxy will mark a node down after 3 consecutive check fail

Redis tunables

Parameter	Value	Description
min_slaves_to_write	1	Redis will stop accepting writes if `min-slaves-to-write` number of
min-slaves-max-lag	10	slaves are lagging by `min-slaves-max-lag` seconds.

Redis sentinel tunables

Parameter	Value	Description
down-after-milliseconds	5000	Mark objectively down if server is seen down
min-quorum	2	Redis will mark a server subjectively down if min-quorum sentinel services notice it down
failover-timeout	180000	No new failover will trigger be triggered before failover timeout
parallel_syncs	1	Number of redis salves can parallelly sync

Key takeaways

You can achieve reasonably consistent, partition tolerant redis cluster with Haproxy + Sentinel
Haproxy is a versatile piece of software, it's a devops swiss army knife.

Questions?

Made with Slides.com