For months I had sleepless nights over Celery backend system crashes. I was using RabbitMQ initially and then switched to Redis and it didn’t change anything.
The Problem
My app would generate hundreds of data every hour after processing and RabbitMQ would run out of file descriptors to handle it and crash. With Redis, the system became really slow and later crashed (no I didn’t use persistent storage because I didn’t require it).
The Cause
Each time you run a task, Celery creates –
- A new queue (if you are using RabbitMQ)
- A new key (if you are using Redis)
These are immediately cleared so long as you are not saving results. If you are saving results, their default expiry time would be set to 86400 seconds (1 day). This means your newly created structures are going to consume system resources for 1 day, unless you clear it manually. If you are going to be generating 1000 items each hour, both of these systems are going to crash eventually in a few hours.
Some more details on RabbitMQ
The logs would show – Mnesia is overloaded: {dump_log, write_threshold}
I found evidence here. The only solution is to clear /var/lib/rabbitmq/mnesia
which basically is like a hard reset on RabbitMQ. Clearing /var/lib/rabbitmq/mnesia
also clears your virtual host entries, users, all existing queues and data. RabbitMQ is irrecoverable without it.
Some more interesting posts
- http://serverfault.com/questions/337982/how-do-i-restart-rabbitmq-after-switching-machines
- http://stackoverflow.com/questions/6362829/rabbitmq-on-ec2-consuming-tons-of-cpu
From Celery Documentation
Interestingly, I have not been able to configure RPC backend or even invested time in trying it.
Do not use in production.:
This is the old AMQP result backend that creates one queue per task, if you want to send results back as message please consider using the RPC backend instead, or if you need the results to be persistent use a result backend designed for that purpose (e.g. Redis, or a database).
Conclusion
My advice – think twice before you store results.
If you still go ahead and store results, CELERY_TASK_RESULT_EXPIRES
is your friend. Set it to a value that will not cause memory overload. Since I set it to a value of 3600 seconds, I have not had once instance of system crash.
Note: Storing results is not necessarily a bad idea. For example, when you use complex constructs such as groups, chains and chords, Celery requires you to have a backend. I will cover this in a separate post.
[…] Having a “result_backend” is a must for implementing chords. However, it can get very messy. Do take a look at my other blog post about why storing results in celery is a bad idea. […]
LikeLike