Service Task in parallel gateway failing randomly and moved to dead-letter job

Untitled
Service task in parallel gateway are failing randomly.

  • method call A and method call B are using expression to resolve a method.

  • expression used in methodCallA is #{beanA.methodA}

  • expression used in methodCallB is #{beanA.methodB}

  • beanA is singleton bean. (is there something do with this ? I hope not)

  • both method just logs a line to the output, nothing much is done.

  • upon triggering the workflow, either of methodA/methodB is failed randomly.

  • job is moved to deadletter job and upon retrying the job, its passed.

Why is this happening.? why is the job failing in the first place even after marking all the steps as async and exclusive.?

Below are the logs captured after setting debug level log

org.flowable.common.engine.api.FlowableOptimisticLockingException: Could not lock process instance
at org.flowable.engine.impl.persistence.entity.data.impl.MybatisExecutionDataManager.updateProcessInstanceLockTime(MybatisExecutionDataManager.java:328)
at org.flowable.engine.impl.persistence.entity.ExecutionEntityManagerImpl.updateProcessInstanceLockTime(ExecutionEntityManagerImpl.java:980)
at org.flowable.engine.impl.cfg.DefaultInternalJobManager.lockJobScope(DefaultInternalJobManager.java:148)
at org.flowable.job.service.impl.cmd.LockExclusiveJobCmd.execute(LockExclusiveJobCmd.java:56)
at org.flowable.engine.impl.interceptor.CommandInvoker$1.run(CommandInvoker.java:51)
at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperation(CommandInvoker.java:93)
at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperations(CommandInvoker.java:72)
at org.flowable.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:56)
at org.flowable.engine.impl.interceptor.BpmnOverrideContextInterceptor.execute(BpmnOverrideContextInterceptor.java:25)
at org.flowable.common.engine.impl.interceptor.TransactionContextInterceptor.execute(TransactionContextInterceptor.java:53)
at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:72)
at org.flowable.common.spring.SpringTransactionInterceptor.lambda$execute$0(SpringTransactionInterceptor.java:56)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)

Hi @nishantharun,
the execution is happening in parallel and both service tasks are trying to write their result to the database. When both executions are happening at the same time, the result is going to be an optimistic locking exception at the joining parallel gateway. As Tijs described here, you can have concurrency issues and you need to “set the joining parallel gateway to async as well in this case and exclusive = true.”. With this both async jobs are going to proceed only until the exclusive gateway and you don’t have the optimistic locking issue afterwards. A new async task will be created starting at the parallel gateway.

Valentin

@valentin, Thanks for the quick reply. As you can see in the diagram that I have shared . All the four nodes (fork, service-call A , service call B , merge) are marked as async and exclusive . Still I’m getting the error.
We are at the very beginning of adopting flowable in our project and this issue is driving us crazy.
Let me give you a little more back ground.
version used : implementation "org.flowable:flowable-spring-boot-starter:6.5.0
‘org.springframework.boot’ version ‘2.2.5.RELEASE’
so far we haven’t customised anything regarding the process engine. so basically all default settings are used as such.
my wild guess is exclusive flag is not working as expected.
I would be of great help if you point to some working copy with parallel gateways and service task with test assertions.

This is a fairly simple process. Can you share your process xml so we can verify?

Are you doing anything special in the service tasks, that might influence the async behavior?

We’ve got a range of unit tests for this. I guess there is something else interacting and trying to lock the process instance. Also, seeing this exception is not a problem necessarily: the job will be retried a bit later and by then the process instance lock should be freed.