Storing Results in Celery is a Bad Idea

For months I had sleepless nights over Celery backend system crashes. I was using RabbitMQ initially and then switched to Redis and it didn’t change anything.

The Problem

My app would generate hundreds of data every hour after processing and RabbitMQ would run out of file descriptors to handle it and crash. With Redis, the system became really slow and later crashed (no I didn’t use persistent storage because I didn’t require it).

The Cause

Each time you run a task, Celery creates –

  • A new queue (if you are using RabbitMQ)
  • A new key (if you are using Redis)

These are immediately cleared so long as you are not saving results. If you are saving results, their default expiry time would be set to 86400 seconds (1 day). This means your newly created structures are going to consume system resources for 1 day, unless you clear it manually. If you are going to be generating 1000 items each hour, both of these systems are going to crash eventually in a few hours.

Some more details on RabbitMQ

The logs would show – Mnesia is overloaded: {dump_log, write_threshold}

I found evidence here. The only solution is to clear /var/lib/rabbitmq/mnesia which basically is like a hard reset on RabbitMQ. Clearing /var/lib/rabbitmq/mnesia also clears your virtual host entries, users, all existing queues and data. RabbitMQ is irrecoverable without it.

Some more interesting posts

From Celery Documentation

Interestingly, I have not been able to configure RPC backend or even invested time in trying it.

Do not use in production.:

This is the old AMQP result backend that creates one queue per task, if you want to send results back as message please consider using the RPC backend instead, or if you need the results to be persistent use a result backend designed for that purpose (e.g. Redis, or a database).

Conclusion

My advice – think twice before you store results.

If you still go ahead and store results, CELERY_TASK_RESULT_EXPIRES is your friend. Set it to a value that will not cause memory overload. Since I set it to a value of 3600 seconds, I have not had once instance of system crash.

Note: Storing results is not necessarily a bad idea. For example, when you use complex constructs such as groups, chains and chords, Celery requires you to have a backend. I will cover this in a separate post.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s