Flowable stuck in long complex processes

Hello!

In our application we have dynamically generated process with a lot of subprocesses. Before we had flow with sync http tasks, but faced problems with FlowableOptimisticLocking Exception and according to business logic we could not retry tasks. So, after refactoring now we have process with parallel subprocesses with async receive tasks which have listeners that execute http requests. In small processes with 7-9 subprocesses everything works fine. But when we have 50+ subprocesses with 6-9 threads in parallel after 15-20 of completed subprocesses tasks begin to have longer timeouts between each other, if in the beginning we have 1-2 seconds, at the end it increases up to 7-10 minutes and even another processes stuck. But when another process B starts - that stuck process begins to run normally for some time and then starts to hang on again, but that process B runs normally till the end if it’s not that large.

We are using Spring Boot and app runs in container. All resources measures are fine and not even close to it’s limits.

Can you please suggest anything: possible root causes of this behaviour or where to look to find it.

Thank you in advance!

Hi Alina,

The best would be to reproduce the issue in the jUnit test. Did FlowableOptimisticLockingException occur on the join parallel gateway? (That’s the place where I would expect it). To solve this issue may be you can make join parallel gateway async. The chance is decreased but not 0.

Regards
Martin

Thanks for the quick reply, Martin!

We are not facing FlowableOptimisticLockingException now, or at least very rarely, cause it was the reason we changed http tasks on receive tasks. So it’s probably not the root cause.

But I found out that we only have 8 records in act_ru_job table, and it looks like we don’t have enough free(not locked) jobs to execute tasks. Currently trying to increase pool for jobs.

Yes exactly that could be the cause.

We are seeing similar issues where jobs are stuck but we have enough jobs in the pool. We run flowable 5 backward compatible engine along with flowable 6.5 engine and use same async executor for both process engine configuration. There are no errors whatsoever when these jobs are stuck - any ideas will really help - thx

When you say ‘stuck’, what does that mean? That they are in the database table, but not picked up? If so, can you paste the content of such a row? Also, can you share how you’ve configured your process engine?