Worse performance with Flowable 6.8.0 compared to 6.6.0 (Global acuqire lock feature)

Hello Flowable colleagues,

After adopting the newest flowable (6.8.0) I have noticed a performance degradation of about 50% of our service with global lock disabled (default). After some investigation into which default parameters were changed, I saw:
maxTimerJobsPerAcquisition was increased from 1 to 512
maxAsyncJobsDuePerAcquisition was increased from 1 to 512
After setting them to 1 as in flowable 6.6.0, the performance is again normal (not slower).

After enabling the global lock the performance is even worse. Tried modifying maxTimerJobsPerAcquisition and maxAsyncJobsDuePerAcquisition to 1 and the results were better but still slower.
Also tried to increase maxTimerJobsPerAcquisition, maxAsyncJobsDuePerAcquisition to 1024 and the queue size to 8192 (from 4) but the performance is still not that good.

Are there any known cases where the global lock might be actually slower? Any recommendation if these parameters are good for such a use case?

Our system parameters are:

  • 5 flowable engines (5 cloud foundry instances)
  • 4GB ram and 4GB disk
  • PostgreSQL DB with 16GB of RAM and 1000GB storage with 4 CPUs

Flowable Configuration

  • DefaultAsyncTaskExecutor
    • QueueSize - 4
    • CorePoolSize - 32
    • MaxCorePoolSize - 64
    • SecondsToWaitOnShutdown - 480
  • AsyncExecutor
    • AsyncJobLockTimeInMillis - 30 minutus
    • DefaultAsyncJobAcquireWaitTimeInMillis - 3 seconds

Our flowable processes contain a large number of Service tasks and parallel sub-processes.
Example diagram: https://github.com/cloudfoundry/multiapps-controller/blob/master/multiapps-controller-process/src/main/resources/org/cloudfoundry/multiapps/controller/process/deploy-app.bpmn

Full flowable configuration: https://github.com/cloudfoundry/multiapps-controller/blob/master/multiapps-controller-web/src/main/java/org/cloudfoundry/multiapps/controller/web/configuration/FlowableConfiguration.java#L77

Best regards,
Ivan

Hey @IvanDimitrov,

What exactly are you measuring to notice the 50% degradation?

If you want to see the reasoning for the global lock I would suggest to read the following blog posts:

Through our benchmarks we have seen that using the global flag should provide better performance since one node can acquire multiple jobs without leading to optimistic locking exceptions.

What we have also seen is that when the queue size is larger (e.g. 8192) then the amount of threads in the pool that are going to executing the jobs will start to raise only once the queue is full (I guess that’s why you have it on 4). We usually suggest having the core and max pool size set with the same value and allow the core thread timeout.

Having said all of this and having had a brief look at your BPMN the reason for it being slower might be due to the fact that we have done certain optimisations for the parallel multi instance and now you have more jobs (not sure what amount of async tasks you are dealing with). What you can try is to set parallelMultiInstanceAsyncLeave on the ProcessEngineConfigurationImpl to false and see if it has an impact.

Cheers,
Filip

1 Like