We have a process definition that goes like this: Script Task (async=true) → Exclusive Gateway → Script Task. At the beginning of each month we have to run about 20K of process instances of this process definition.
Because the first Script Task is marked as asynchronous we end up having 20K entries in act_ru_job.
When I keep running this query repeatedly SELECT COUNT(*) FROM act_ru_job I observe this pattern: the count goes down pretty rapidly for some time, which means the jobs are being processed quite quickly. But then it stalls and takes forever to finish the jobs.
Is it a matter of misconfiguration, e.g. should I adjust some other settings to match the reality? At this point I ran out of ideas. If anyone came across a similar scenario and a similar issue I would appreciate if you could share your experience of solving the issue.
I would suggest having the core pool size and max pool size to the same value.
I would also suggest to you to measure the runtime, how many jobs are getting rejected, how fast each job gets executed etc. If everything that is done in the first and second script task is the same speed you should not see a drop in the execution.
You said you are using Flowable 6.4.1, that version is from January 2019, that’s quite old. I would suggest migrating to a newer version, we’ve done numerous improvements around the job execution.
And another question, @filiphr: there is this parameter called asyncExecutorDefaultQueueSizeFullWaitTime that is described as follows:
The time (in milliseconds) the async job (both timer and async continuations) acquisition thread will wait when the internal job queue is full to execute the next query. By default set to 0 (for backwards compatibility). Setting this property to a higher value allows the async executor to hopefully clear its queue a bit.
Does it help, in general, to speed things up if I set it to a non-zero value, e.g. 10000 (10 seconds)?
In the open source we don’t really have something out-of-the-box. We do have different listeners that you could hook into in theory and measure it.
Now that you mention this, I see that you also have the asyncExecutorMaxAsyncJobsDuePerAcquisition set to 5. This means that there is a lot of contention by the different nodes. The nodes keep trying to get the same jobs and lock them. We’ve done a lot of improvements in this area in 6.7.
I would suggest that you read the following blog serios: