Partition tolerant, Highly available redis cluster using haproxy

By Ayush Goyal

Outline

  • Goals
  • Solutions available
  • How and Why we chose the solution we did?
  • Overview of Redis Sentinel
  • Customized tunings for partition tolerance

Old Redis HA Setup

  • We use master-slave replication for HA.
  • Manual failover for redis master, if master goes down.

Problem we wanted to solve

  • Automated failover of redis master (Redis master HA)
  • Partition tolerance

Things to keep in mind for Redis HA

  • Redis async replication - No strong consistency
  • Automated switching is not built-in

Solutions explored

  • Vanilla Redis cluster
  • Vanilla Redis sentinel
  • Haproxy + Smitty + Twemproxy + Redis Sentinel
  • Redis sentinel with haproxy

Vanilla Redis cluster

  • Supports automatic failover with builtin sharding
  • Builtin redis support
  • But requires client to be smart.

Vanilla Redis sentinel

  • Supports automatic failover
  • An external monitoring service to monitor redis state
  • But requires client to be smart.

Haproxy + Twemproxy + Smitty + Redis Sentinel

  • Too many moving parts

Redis Sentinel with haproxy

  • Monitor service with redis sentinel
  • Switch master/slaves in haproxy by monitoring sentinel,
  • No client code changes required

A closer look at Redis Sentinel Service

  • Monitoring service, utilizes quorum
  • Triggers election if quorum subjectively decides that masters are down

A closer look at Redis Sentinel Service

  • In failover scenario, a sentinel leader is selected to promote a slave to master
  • Failover can't happen in minority partition, because majority is required for leader selection

But how would clients detect master failover?

  • Bring haproxy to the mix

Haproxy master external check script

  • Check state from sentinel
  • Sentinel is in quorum?
  • Who is the current master?
  • Is it tcp reachable?
  • If all checks pass, mark the node up

What we achived with this setup

  • Clients can be dumb. All HA logic resides with haproxy.
  • No code changes required. Any project in any language can use this setup.

Do we have partition tolerance?

  • In minority partition, haproxy will mark master down as sentinel won't be in quorum
  • There is still a small window where two redis master can be up, during failover
  • We can reduce that window by tuning configs of various services involved

Haproxy tunables

Parameter Value Description
external-check command <check-script> external script used to perform check
inter 1s Frequency of performing checks
rise 7 Haproxy will mark a node up after 7 consecutive check pass
fall 3 Haproxy will mark a node down after 3 consecutive check fail

Redis tunables

Parameter Value Description
min_slaves_to_write 1 Redis will stop accepting writes if `min-slaves-to-write` number of
min-slaves-max-lag 10 slaves are lagging by `min-slaves-max-lag` seconds.

Redis sentinel tunables

Parameter Value Description
down-after-milliseconds 5000 Mark objectively down if server is seen down
min-quorum 2 Redis will mark a server subjectively down if min-quorum sentinel services notice it down
failover-timeout 180000 No new failover will trigger be triggered before failover timeout
parallel_syncs 1 Number of redis salves can parallelly sync

Key takeaways

  • You can achieve reasonably consistent, partition tolerant redis cluster with Haproxy + Sentinel
  • Haproxy is a versatile piece of software, it's a devops swiss army knife.

Questions?

Made with Slides.com