Asynchronous execution of timer jobs with multiple Job Executors

I have a setup where I want to run multiple Job Executor instances in parallel. Actually, that’s how it is recommended in this article:

I had a look at the source code, and it looks like Flowable does it like this:

  1. Query all not-yet-locked timed jobs from ACT_RU_TIMER_JOB (the condition for lock existence is “LOCK_EXP_TIME_ is null”);
  2. Lock these jobs so that other Job Executors cannot start these jobs in parallel.

If this is indeed how it works, then there is a small time gap between step 1 and step 2. This could introduce a race condition when two threads are doing this at the exact same time: thread B could find jobs that thread A also found but didn’t persist the locks for yet.

Could this really happen, or is there another mechanism at work that prevents this race condition?

I found a hint that there is indeed another mechanism at work, which is ‘optimistic locking’, in this code of AcquireTimerJobsCmd:
protected void lockJob(CommandContext commandContext, TimerJobEntity job, int lockTimeInMillis) {
// This will trigger an optimistic locking exception when two concurrent executors
// try to lock, as the revision will not match.

This gives me some hope that we won’t run into race conditions. Digging in further, I found code in DbSqlSession.flushUpdates() that throws FlowableOptimisticLockingException when optimistic locking fails.

Hence the race condition is not really avoided, but the effects are mitigated when it occurs.

Hey @erikvoorbraak,

Your analysis is spot on.

In theory you cannot really avoid such race condition. That is why there are multiple layers of protection to ensure that once a lock is acquired the job can be executed. Failing to acquire a lock (via optimistic locking) is actually the mechanism that ensures that the timer job would be executed only once which is what you need.