Celery – Groups, Loops & Parallelism

TL;DR

In the post, I am exploring a few questions I had with Celery and parallelism on it using groups and implementing the same feature using blocking for loop and non-blocking for loop. I decided to do these experiments based on certain optimization challenges on an application I am working on.

The Question

I have a bunch of tasks that I am extracting from Elasticsearch and running. These tasks are individual Celery tasks and can be executed with the “.delay” function in Celery. A conversation with a friend put the following question in my head –

Can Celery Groups be faster than running a for loop (using delay) on same Celery tasks?

The Steps

  1. Create a celery task that performs the actual work.
  2. Create a celery task that runs the actual task in a for loop (non blocking mode – using delay).
  3. Create a celery task that runs the actual task using groups.
  4. Compare the time taken by for loop task and group task when run independently of each other.

The Code

Refer this gist.

The Result

The log results after running run.py for loop separately and for groups separately¬† –

[2018-05-30 13:23:59,344: WARNING/ForkPoolWorker-4] Bello Merld at 2018-05-30T13:23:59.344221
[2018-05-30 13:24:02,797: WARNING/ForkPoolWorker-6] loop_service ended, Total time: 2.80s
[2018-05-30 13:24:05,750: WARNING/ForkPoolWorker-1] Item Number: 0, Type: Loop, Total time: 3.17s
[2018-05-30 13:24:05,749: WARNING/ForkPoolWorker-8] Item Number: 1, Type: Loop, Total time: 3.17s
[2018-05-30 13:24:06,492: WARNING/ForkPoolWorker-7] Item Number: 2, Type: Loop, Total time: 3.17s
[2018-05-30 13:24:06,933: WARNING/ForkPoolWorker-5] Item Number: 3, Type: Loop, Total time: 3.17s
[2018-05-30 13:24:07,398: WARNING/ForkPoolWorker-3] Item Number: 4, Type: Loop, Total time: 3.17s
---
[2018-05-30 13:24:59,074: WARNING/ForkPoolWorker-7] Bello Merld at 2018-05-30T13:24:59.074209
[2018-05-30 13:25:02,250: WARNING/ForkPoolWorker-5] group_service ended, Total time: 2.74s
[2018-05-30 13:25:04,888: WARNING/ForkPoolWorker-3] Item Number: 0, Type: group, Total time: 3.17s
[2018-05-30 13:25:05,319: WARNING/ForkPoolWorker-8] Item Number: 1, Type: group, Total time: 3.16s
[2018-05-30 13:25:06,226: WARNING/ForkPoolWorker-4] Item Number: 3, Type: group, Total time: 3.17s
[2018-05-30 13:25:06,226: WARNING/ForkPoolWorker-7] Item Number: 2, Type: group, Total time: 3.17s
[2018-05-30 13:25:06,903: WARNING/ForkPoolWorker-1] Item Number: 4, Type: group, Total time: 3.17s
Test Number Time Taken
n = 5 n = 10 n = 25 n = 50
For Loop Group For Loop Group For Loop Group For Loop Group
1 1.49 1.19 4.81 4.88 6.70 7.65 12.06 15.69
2 1.31 1.13 4.24 4.42 7.05 8.36 12.82 14.34
3 1.53 1.27 4.40 2.60 7.64 6.12 13.14 15.20
4 3.00 1.43 4.51 2.53 5.52 8.12 13.25 13.13
5 1.11 2.93 2.48 2.60 7.52 8.65 11.00 11.95
Mean 1.69 1.59 4.09 3.41 6.89 7.78 12.45 14.06

A few really interesting things stand out in this result –

  1. The logs tell you the different workers that handled the request which means all the tasks ran in parallel.
  2. The mean time for running using loops and running using groups doesn’t change significantly regardless of what the n is.
  3. Also, as the n increases, the time almost approximately doubles. Ex: 5 tasks ~ 2s, 10 tasks ~ 4s and this trend is seen in both cases although it cannot be generalized with such little data as we have.

The million dollar question now – Why is this happening?

The Reason

It is pretty simple – the for loop works like the groups. When you are waiting for the item in the for loop to complete, it will increase the time taken for the task to run but in this case, we are doing exactly the same thing. It might’ve been obvious looking at the code but doesn’t hurt to experiment.

The Other Question

While doing this experiment I noticed a piece of code where I am running a for loop and waiting to collect and transform the results before I move on further. And I am wondering now –

Does a blocking for loop adversely affect performance of a Celery task?

The Code

Use the same code in gist and comment line 36 and uncomment line 37.

The Result

Running the code thrice shows you that the time taken just multiplies blocking the worker until it is finished.

[2018-05-30 15:57:27,006: WARNING/ForkPoolWorker-4] Bello Merld at 2018-05-30T15:57:27.005575
[2018-05-30 15:57:30,766: WARNING/ForkPoolWorker-8] Item Number: 0, Type: Loop, Total time: 3.31 sec
[2018-05-30 15:57:33,941: WARNING/ForkPoolWorker-8] Item Number: 1, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:57:37,111: WARNING/ForkPoolWorker-8] Item Number: 2, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:57:40,282: WARNING/ForkPoolWorker-8] Item Number: 3, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:57:43,449: WARNING/ForkPoolWorker-8] Item Number: 4, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:57:43,450: WARNING/ForkPoolWorker-8] loop_service ended, Total time: 16.00 sec
---
[2018-05-30 15:57:57,344: WARNING/ForkPoolWorker-7] Bello Merld at 2018-05-30T15:57:57.343716
[2018-05-30 15:58:01,220: WARNING/ForkPoolWorker-2] Item Number: 0, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:04,388: WARNING/ForkPoolWorker-2] Item Number: 1, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:07,560: WARNING/ForkPoolWorker-2] Item Number: 2, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:10,728: WARNING/ForkPoolWorker-2] Item Number: 3, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:13,895: WARNING/ForkPoolWorker-2] Item Number: 4, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:13,896: WARNING/ForkPoolWorker-2] loop_service ended, Total time: 15.84 sec
---
[2018-05-30 15:58:22,798: WARNING/ForkPoolWorker-4] Bello Merld at 2018-05-30T15:58:22.798778
[2018-05-30 15:58:26,404: WARNING/ForkPoolWorker-8] Item Number: 0, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:29,627: WARNING/ForkPoolWorker-8] Item Number: 1, Type: Loop, Total time: 3.22 sec
[2018-05-30 15:58:32,800: WARNING/ForkPoolWorker-8] Item Number: 2, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:35,967: WARNING/ForkPoolWorker-8] Item Number: 3, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:39,138: WARNING/ForkPoolWorker-8] Item Number: 4, Type: Loop, Total time: 3.17 sec
[2018-05-30 15:58:39,139: WARNING/ForkPoolWorker-8] loop_service ended, Total time: 15.90 sec

Conclusion

In summary, here are my recommendations –

  1. Keep your tasks short and specific to what you want to achieve and it will yield faster results.
  2. Avoid using loops in Celery tasks. If you cannot avoid them, split the logic in such a way that you can use “Groups” and thus achieve parallelism.
  3. If you think loops are unavoidable because you have some tasks to achieve based on the result or the completion of the loop, explore “Chords”. In a nutshell, “Chords” are “Groups” with a callback for exactly the same purpose as yours.
  4. At any point prefer “Groups” over “Loops”. I know this may sound like a stupid reason but “Groups” are a feature of Celery and so might as well use it and there might be optimizations that I am not aware of.

Happy coding!

One Comment Add yours

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s