Zombie Processes With Parallel Async Service Tasks

chaserb · January 9, 2026, 12:12am

Hello,

We’re encountering a very reproducible problem with asynchronous service tasks (event registry “send-event”) executed in parallel. I’ve assembled a minimal facsimile of the problem here: GitHub - chaserb/parallel-async-service-tasks: Demonstration of zombie process instances . The problem is that about 40-50% of the time, my process instance winds up in a zombie state in that it has a single record in the ACT_RU_EXECUTION table with a null ACT_ID_, PARENT_ID_, and SUPER_EXEC_ even though all the service tasks complete successfully.

I noticed a similar problem here: ParallelGateway - Process Instance remains after all sub-processes complete , but I’m positive our async executor is running normally.

I also tried the various serviceTask options suggested here: ParallelGateway - Process Instance remains after all sub-processes complete - #2 by adymlincoln , but they did not help the situation.

Thanks for your help,
Chase

chaserb · January 12, 2026, 2:48am

LET’S HOLD OFF ON THIS FOR A BIT. I don’t think this is demonstrating what I intended. Let me work it a little more.

chaserb · January 12, 2026, 7:41pm

I think I’m on to the solution here. I noticed I was getting FlowableOptimisticLockingException when my parent process had an explicitly declared parallelGateway on the join side, so I updated my TestInboundChannel to catch FlowableOptimisticLockingException and retry the event which I believe (am hoping ) simulates the NACKs back to the Rabbit Message Broker that should be performed by the production event registry…I’ll test that.

I think the real issue is that my original process definition had implicit join gateways, which was what produced the zombie process instances described above. Having explicit parallel gateways with retry on FlowableOptimisticLockingException causes my tests to pass in the example repo.

joram · January 12, 2026, 9:10pm

I haven’t looked at the example, but in that situation, I would expect a deadletter job for the joining gateway. Did you see that?

chaserb · January 12, 2026, 9:47pm

No, I don’t have any records in the dead letter job table for those process instances.

joram · January 13, 2026, 10:42am

I had a quick look at the code and had following question:

You’re executing the jobs yourself through managementService.executeJob. Is there a reason for not wanting to use the async executor (as there are some extra pieces of logic that happen when doing so)
What’s the purpose of making the send task a wait state (i.e. triggerable)? Not sure I’m getting the use case here yet.

chaserb · January 13, 2026, 2:03pm

Sorry, I could have been more clear. Thanks for your quick reply.

Regarding the triggerable flag, we use that to implement the Request-Reply pattern with async callback, using the execution ID as the correlation ID. For example, one of our serviceTasks will dispatch a “send email request” event to our service that accomplishes this, and then check the success of that request on the “send email response” event.

Regarding the managementService.executeJob(), my only intent was to ensure the test was truly multi-threaded to simulate multiple cluster nodes receiving responses nearly simultaneously. I didn’t realize the async executor was an option in a test setup. I can give that a try.

Thanks,
Chase

chaserb · January 13, 2026, 9:05pm

A question I have regarding this relates to the async executor which we have configured with the default number of retries at 3 retries. Will the event registry “response” events benefit from this setting? Reason I ask is that the request events are dispatched on threads named “task-123”, but the response events are received on threads named “org.flowable.eventregistry.rabbit.ChannelRabbitListenerEndpointContainer#workflowInbound-1”

joram · January 14, 2026, 9:51am

If your send task is async (which it is), then the sending will be done by the async executor.

On the receival side, you would also need to make the first step async, or it will be run on the thread of the receiver. It’s the same story as e.g. a web request - it’ll be handled by the web container thread, unless you make a step async.

chaserb · January 15, 2026, 9:21pm

Let me rephrase what you said just so I’m sure I understand. I currently have this in the child process:

(startEvent) → [serviceTask (async=true)] → (endEvent)

To incorporate the async executor on the receival side, I would need to make the first step async, which in the case above becomes the following:

(startEvent) → [serviceTask (async=true)] → (endEvent (async=true))

Is that correct?

Thanks again,
Chase

joram · January 16, 2026, 1:21pm

Yes - you mention above you’re using rabbitMQ, right?
This means that the receival will be handled on the rabbitMQ thread and making the step async will then hand-off to the async executor, freeing up the rabbitMQ thread.

Topic		Replies	Views
ParallelGateway - Process Instance remains after all sub-processes complete Flowable Engine	2	597	November 5, 2018
Service Task in parallel gateway failing randomly and moved to dead-letter job Flowable Engine	3	1032	March 14, 2020
Parallel Gateway working Flowable Engine	3	4268	March 20, 2018
Parallel multi-instance subprocess not working Flowable Engine	4	1611	December 15, 2021
Asynchronous Service Invocation using Flowable Flowable Engine	3	2340	January 29, 2020

Zombie Processes With Parallel Async Service Tasks

Related topics