OptimisticLockingException in a clustered environment

Hi,

We’re running Flowable 5.23 in a clustered environment - that is, with two Flowable hosts pointing to the same database.

Due to increased load, we have tried increasing the configuration of the Job Executor to 30 max threads instead of the defaulted 10, however, we’re seeing the following DEBUG log in the logs at a fairly rapid rate, consistently more than once per second.

Optimistic locking exception during job acquisition. If you have multiple job executors running against the same database, this exception means that this thread tried to acquire a job, which already was acquired by another job executor acquisition thread.This is expected behavior in a clustered environment. You can ignore this message if you indeed have multiple job executor acquisition threads running against the same database. Exception message: JobEntity [id=455903118] was updated by another transaction concurrently

It is happening on one host more than the other, and when we do a thread dump and grep for “ExecuteJobsRunnable.run”, we note that the Job Executor gets up to 15 or so threads, and the other host is only getting to 1 or 2 threads

So my question is this: Is this still expected behaviour? I know the error says it is, but should the error happen at that frequency? Is there any way to reduce the conflicts between the two hosts?

Regards,
Michael

Yes, that exception is to be expected if two executors compete for the same database rows.
Depending on the amount of jobs (timers, async service tasks, etc.), more than once per second is definitively possible.

A few questions:

  • are you using the Async executor?
  • did you try changing the acquire time? That should have some impact.
  • how much jobs are getting created/executed?
  • what are your current executor settings?

FYI in V6, it is possible to use a message queue based executor instead of threadpools, exactly to solve this issue.But if your amount of jobs is low, that might not be needed, and two executors should suffice.

Hi,

We aren’t using the async job executor. We did some experiments and we found some odd behaviour where on a few occasions the job executor would just stop processing entirely without any form of indication in the logs. So we decided to put the async executor on the backburner until we get a chance to investigate it further.

The current settings for the job executor on both machines are:
maxPoolSize = 30
queueSize = 3
corePoolSize = 3
keepAliveTime = 0
lockTimeInMillis = 120000
waitTimeInMillis = 2500

Process Jobs:
Ready to be executed = ~ 4000
Jobs in exception = 13100 (A little high but shouldn’t affect the query?)
Timer Jobs = 26200

Running Process Instances = 144400

As I said before, we were seeing one of the hosts get up to 30, and the other basically hitting that debug log fairly constantly and being starved of jobs. Occassionally the starved one will increase jobs, but then the other one would go down in threads active