Hello Joram,
Sorry for the late reply.
I have been experimenting with different Flowable settings, and the best results that I observe are with these settings:
• MaxAsyncJobsDuePerAcquisition = 1
• MaxTimerJobsPerAcquisition = 1
• GlobalAcquireLockEnabled = false
• ParallelMultiInstanceAsyncLeave = false
• CorePoolSize / MaxPoolSize – does not really matter, the results are similar with 64 or 512 threads. The queue size also does not matter at all since it is never full.
I tried the global lock once again, and the results are still worse.
I tried increasing MaxAsyncJobsDuePerAcquisition and MaxTimerJobsPerAcquisition with and without the global lock (1 to 50), and it does not perform better with a configuration different from 1.
I also increased the core and max pool sizes and queue size, but these settings do not lead to different results.
Finally, I tried to “add” more acquisition threads using an extension of the DefaultAsyncJobExecutor:
public class ThreadedAsyncJobExecutor extends DefaultAsyncJobExecutor {
private List<AcquireAsyncJobsDueRunnable> asyncJobAcquisitionRunnables;
private List<AcquireTimerJobsRunnable> timerJobAcquisitionRunnables;
@Override
protected void startJobAcquisitionThread() {
super.startJobAcquisitionThread();
asyncJobAcquisitionRunnables = new ArrayList<>(30);
for (int i = 0; i < 30; i++) {
JobInfoEntityManager<? extends JobInfoEntity> jobEntityManagerToUse = jobEntityManager != null ? jobEntityManager
: jobServiceConfiguration.getJobEntityManager();
AcquireAsyncJobsDueRunnable asyncRunnable = new AcquireAsyncJobsDueRunnable("flowable-acquire-async-jobs-"
+ (i + 1), this, jobEntityManagerToUse, asyncJobsDueLifecycleListener, new AcquireAsyncJobsDueRunnableConfiguration());
asyncJobAcquisitionRunnables.add(asyncRunnable);
Thread thread = new Thread(asyncRunnable);
thread.setName("flowable-acquire-async-jobs-" + (i + 1));
thread.start();
LOGGER.info("Started job acquisition thread: {}", thread.getName());
}
}
@Override
protected void startTimerAcquisitionThread() {
super.startTimerAcquisitionThread();
timerJobAcquisitionRunnables = new ArrayList<>(30);
for (int i = 0; i < 30; i++) {
AcquireTimerJobsRunnable timerJobRunnable = new AcquireTimerJobsRunnable(this, jobServiceConfiguration.getJobManager(),
timerLifecycleListener, new AcquireTimerRunnableConfiguration(), configuration.getMoveTimerExecutorPoolSize());
timerJobAcquisitionRunnables.add(timerJobRunnable);
Thread thread = new Thread(timerJobRunnable);
thread.setName("flowable-acquire-timer-jobs-" + (i + 1));
thread.start();
LOGGER.info("Started timer job acquisition thread: {}", thread.getName());
}
}
…
}
Which, again, did not lead to better performance even worse, even though it seems like the jobs from act_ru_job are not getting executed for a long time.
I also checked the database queries, and there are no slow queries — all of them are under 900ms, and most are under 200-300ms.
“If you have job rows with no lock owner, it typically means the internal queue of the node it was created on is full. It is inserted without lock owner, so other nodes can pick it up.”
Does this apply to “async = true” service tasks, since all of our jobs are async? I see that all of the jobs are inserted without lock_owner_ and lock_expiration_time_.
Also, one more question regarding the rev_ counter. As far as I understand, it should be incremented when the job is executed and is used for optimistic locking during update queries. Is this correct? Because I’ve seen that for some jobs, the counter is 8 or 9—why is that?
In our setup, all the jobs are async, and we have a lot of inner call activities that trigger more parallel processes, which in turn trigger other parallel call activities. They are usually not slow, but there might be some async service tasks that take a few seconds.
We usually start to notice degradation after 15-20 minutes of our performance tests, after ~200 parallel processes like this [1] start and run continuously.
If I run 10 parallel processes (less load), this issue is not observed.
I can provide you access to our Cloud Foundry account and, respectively, the database if that can help with the investigation, or even set up a meeting to show our setup if you think this is more appropriate.
[1] multiapps-controller/multiapps-controller-process/src/main/resources/org/cloudfoundry/multiapps/controller/process/xs2-deploy.bpmn at master · cloudfoundry/multiapps-controller · GitHub
Best regards,
Ivan