Inconsistent context of the JOB_EXECUTION_FAILURE event with and without asyncComplete flag

VasilTsimashchuk · April 5, 2022, 4:45pm

Dear Flowable experts,

The Flowable Engine dispatches a JOB_EXECUTION_FAILURE event when an exception is thrown during the EndExecutionOperation of a process instance. The failing job is converted into a DeadLetterJob.

In our environment, a CallActivity is used to trigger (one or multiple) SubProcess instance(s). In addition to that, there is an event listener which handles failing subprocess instances.

We’ve discovered that the context of the JOB_EXECUTION_FAILURE event is set differently depending on the asyncComplete flag of the CallActivity:

In case of asyncComplete=false, the JOB_EXECUTION_FAILURE event is triggered in the context of the subprocess execution, the DeadLetterJob belongs to the subprocess execution.
In case of asyncComplete=true, the JOB_EXECUTION_FAILURE event is triggered in the context of the super process execution, the DeadLetterJob belongs to the super process execution.

Is it an intended behavior? If yes, why?

From our perspective, the JOB_EXECUTION_FAILURE event should always be dispatched in the context of the execution which caused the exception.
Currently, this is the case for CallActivity triggered with asyncComplete=false, but not for asyncComplete=true.
The following PR adjusts the behaviour accordingly.

Thank you!

VasilTsimashchuk · April 20, 2022, 9:13am

Any opinion would be highly appreciated here

joram · April 20, 2022, 7:21pm

Just be precise: this only happens when executing the last part of the instance asynchronously, right?

Looking at the javadoc in the AsyncCompleteCallActivityJobHandler, it does look like that. If that jobs fails, it should be retried in the context of the parent execution as that’s the job failing.

However, in this use case there are multiple jobs at play:

the job for the async call activity
the job for completing asynchronously a multi-instance

A unit test would clarify things here, to make sure we’re talking about the same thing.

VasilTsimashchuk · April 28, 2022, 9:01am

Hi @joram,
Thank you for your reply!

In our environment, all the flow elements are with the flag async=true. And all the observations we’ve made relate to this configuration.

With the change proposed, the retry will still be correct, as the corresponding job gets created accordingly from the dead-letter job.

We would be ready to introduce a unit test, but some inspiration on its design would be highly appreciated. Could you please briefly describe the scope of the expected test and/or give a link to an example in the code which we could use as a starting point?
Thank you very much in advance!

Best,
Vasil

joram · May 16, 2022, 3:18pm

Sorry for not responding sooner - was a very busy few last weeks …

The retry will still be correct, but I don’t think we’re talking about the same thing yet.
When using asyncComplete=true, the engine is instructed to create an additional job when all multi instances of the call activity are finished. When using asyncComplete=false, this job is not created.

If I look at the PR, it changes the process instance / execution id of that particular job and not of the actual jobs that execute the call activity. That’s a fundamental difference: the completion happens in the context of the calling process instance and thus that should be the reference imho.

Unless I’m mistaken here and you’re talking about a difference in behavior for the ‘regular’ async jobs (the ones doing the call activity)? But looking at the PR, I don’t think so.

A unit test with a multi-instance call activity with once asyncComplete=true and once with false, showing the differences.That would be required to accept this PR anyway :-). I’m still struggling to see the use case here, so an example of the current (assumed) wrong behavior would clarify a lot here, I think.

VasilTsimashchuk · May 20, 2022, 1:12pm

Thanks @joram! We’ll introduce the test - this should clarify the concerns.

VasilTsimashchuk · July 8, 2022, 12:50pm

Hi @joram,

Wanted to give you a short update here.
Sorry for the delay! We are still on the topic - needed all the capacity for some urgent initiatives. We now have a “go” from the PO side. Please expect the test to be added to the PR within the next 2 weeks.

Best,
Vasil

VasilTsimashchuk · October 13, 2022, 11:33am

Hi @joram,

sorry for the delay! We have updated the PR with the relevant tests. Could you please have a look whenever you have a minute?
Thank you very much in advance!

Best,
Vasil

Topic		Replies	Views
Different behaviour async executor in flowable5 compatibility mode Flowable Engine	3	1706	September 12, 2017
Task service after a user task doesn't populate JOB_EXECUTION_FAILURE event Flowable Engine	4	499	January 11, 2022
Using CallActivity Flowable Engine	41	6340	August 15, 2018
Behavior of async executor Flowable Engine	8	4348	December 19, 2017
How to avoid all activities to be rolled back on Flowable Exception Flowable Engine	15	2843	October 2, 2019

Inconsistent context of the JOB_EXECUTION_FAILURE event with and without asyncComplete flag

Related topics