Dima Petruk, RCS-SIS-OA
Supervisor: David Caro
Starting from Celery v3.0+ supports multiple brokers which are switched in a round-robin fashion if one is down
[
'transport://userid:password@hostname:port//',
'transport://userid:password@hostname:port//'
]
In Celery v4.0+ this functionality doesn't work properly, doesn't switch to the next node and client fails (issue #3921)
--without-mingle option
Does not support specifying multiple backends
If master is not working, Sentinel promotes slave to master
'pyamqp://userid:password@HAProxy-IP:port//'
Is indeed a single point of failure, but we have a separate instance of HAProxy on each web and worker node, so if node goes down - a certain web server doesn't send tasks either.
HAProxy Cluster can be set up using keepalived.
Celery didn't do a good job of proper switching to another node after failover. So, for around 1 minute most of the tasks were lost and rabbitmq server is flooded with opened/closed connection requests
=WARNING REPORT==== 4-Apr-2017::11:11:04 ===
closing AMQP connection <0.15366.16> (188.184.29.122:43780 -> 188.184.81.248:5672):
client unexpectedly closed TCP connection
[2017-04-04 11:09:06,267: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 280, in start
blueprint.start(self)
...
File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 530, in _close
(class_id, method_id), ConnectionError)
ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with
reason 'shutdown'
[2017-04-04 11:09:09,292: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@188.184.29.122:5672//: Socket closed.
Trying again in 2.00 seconds...
[2017-04-04 11:09:14,309: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@188.184.29.122:5672//: Socket closed.
Trying again in 4.00 seconds...
[2017-04-04 11:09:18,327: INFO/MainProcess] Connected to amqp://guest:**@188.184.29.122:5672//
Requires at least 6 servers (3 master + 3 slaves), which is too much for its' needs as result backend
HAProxy checks if node is master with TCP checks to Redis
CELERY_RESULT_BACKEND = 'rpc://:@HAProxy-IP:port//'
backend bk_redis
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server r1 188.184.94.146:7000 check inter 1s
server r2 188.184.88.217:7000 check inter 1s
1) 1 task queue (Celery 4.0.2)
2) 1 flask app, which sends tasks to the worker
3) RabbitMQ cluster (2 nodes, mirrored queues)
4) 2 Redis servers in master/slave configuration
5) 3 Redis Sentinel servers
6) HAProxy with 2 endpoints:
1) 1 task queue (Celery 4.0.2)
2) 1 flask app
3) 1 RabbitMQ server
4) 1 Redis server (master)