Inconsistent context of the JOB_EXECUTION_FAILURE event with and without asyncComplete flag

Dear Flowable experts,

The Flowable Engine dispatches a JOB_EXECUTION_FAILURE event when an exception is thrown during the EndExecutionOperation of a process instance. The failing job is converted into a DeadLetterJob.

In our environment, a CallActivity is used to trigger (one or multiple) SubProcess instance(s). In addition to that, there is an event listener which handles failing subprocess instances.

We’ve discovered that the context of the JOB_EXECUTION_FAILURE event is set differently depending on the asyncComplete flag of the CallActivity:

  • In case of asyncComplete=false, the JOB_EXECUTION_FAILURE event is triggered in the context of the subprocess execution, the DeadLetterJob belongs to the subprocess execution.
  • In case of asyncComplete=true, the JOB_EXECUTION_FAILURE event is triggered in the context of the super process execution, the DeadLetterJob belongs to the super process execution.

Is it an intended behavior? If yes, why?

From our perspective, the JOB_EXECUTION_FAILURE event should always be dispatched in the context of the execution which caused the exception.
Currently, this is the case for CallActivity triggered with asyncComplete=false, but not for asyncComplete=true.
The following PR adjusts the behaviour accordingly.

Thank you!

Any opinion would be highly appreciated here :slight_smile:

Just be precise: this only happens when executing the last part of the instance asynchronously, right?

Looking at the javadoc in the AsyncCompleteCallActivityJobHandler, it does look like that. If that jobs fails, it should be retried in the context of the parent execution as that’s the job failing.

However, in this use case there are multiple jobs at play:

  • the job for the async call activity
  • the job for completing asynchronously a multi-instance

A unit test would clarify things here, to make sure we’re talking about the same thing.

Hi @joram,
Thank you for your reply!

In our environment, all the flow elements are with the flag async=true. And all the observations we’ve made relate to this configuration.

With the change proposed, the retry will still be correct, as the corresponding job gets created accordingly from the dead-letter job.

We would be ready to introduce a unit test, but some inspiration on its design would be highly appreciated. Could you please briefly describe the scope of the expected test and/or give a link to an example in the code which we could use as a starting point?
Thank you very much in advance!

Best,
Vasil

Sorry for not responding sooner - was a very busy few last weeks …

The retry will still be correct, but I don’t think we’re talking about the same thing yet.
When using asyncComplete=true, the engine is instructed to create an additional job when all multi instances of the call activity are finished. When using asyncComplete=false, this job is not created.

If I look at the PR, it changes the process instance / execution id of that particular job and not of the actual jobs that execute the call activity. That’s a fundamental difference: the completion happens in the context of the calling process instance and thus that should be the reference imho.

Unless I’m mistaken here and you’re talking about a difference in behavior for the ‘regular’ async jobs (the ones doing the call activity)? But looking at the PR, I don’t think so.

A unit test with a multi-instance call activity with once asyncComplete=true and once with false, showing the differences.That would be required to accept this PR anyway :-). I’m still struggling to see the use case here, so an example of the current (assumed) wrong behavior would clarify a lot here, I think.

Thanks @joram! We’ll introduce the test - this should clarify the concerns.

Hi @joram,

Wanted to give you a short update here.
Sorry for the delay! We are still on the topic - needed all the capacity for some urgent initiatives. We now have a “go” from the PO side. Please expect the test to be added to the PR within the next 2 weeks.

Best,
Vasil

Hi @joram,

sorry for the delay! We have updated the PR with the relevant tests. Could you please have a look whenever you have a minute?
Thank you very much in advance!

Best,
Vasil