I have a setup where I want to run multiple Job Executor instances in parallel. Actually, that’s how it is recommended in this article: http://www.mastertheboss.com/jboss-jbpm/activiti-bpmn/clustering-activiti-bpmn.
I had a look at the source code, and it looks like Flowable does it like this:
- Query all not-yet-locked timed jobs from ACT_RU_TIMER_JOB (the condition for lock existence is “LOCK_EXP_TIME_ is null”);
- Lock these jobs so that other Job Executors cannot start these jobs in parallel.
If this is indeed how it works, then there is a small time gap between step 1 and step 2. This could introduce a race condition when two threads are doing this at the exact same time: thread B could find jobs that thread A also found but didn’t persist the locks for yet.
Could this really happen, or is there another mechanism at work that prevents this race condition?
I found a hint that there is indeed another mechanism at work, which is ‘optimistic locking’, in this code of AcquireTimerJobsCmd:
protected void lockJob(CommandContext commandContext, TimerJobEntity job, int lockTimeInMillis) {
// This will trigger an optimistic locking exception when two concurrent executors
// try to lock, as the revision will not match.
This gives me some hope that we won’t run into race conditions. Digging in further, I found code in DbSqlSession.flushUpdates() that throws FlowableOptimisticLockingException when optimistic locking fails.
Hence the race condition is not really avoided, but the effects are mitigated when it occurs.
Hey @erikvoorbraak,
Your analysis is spot on.
In theory you cannot really avoid such race condition. That is why there are multiple layers of protection to ensure that once a lock is acquired the job can be executed. Failing to acquire a lock (via optimistic locking) is actually the mechanism that ensures that the timer job would be executed only once which is what you need.