How to reduce occurrences of locking exceptions during heavy load?

Background

In our setup of Activiti + Spring we map all of our steps (all marked as async) in a process definition to the same java bean, that bean publishes a message to be processed asynchronously and that execute(ActivityExection) method ends.

Once the external service is complete, a message is sent back and consumed by a JMS listener in the same Activiti instance (our custom listener) - which resumes the process (some code snippets of that below)

We have previously seen this exception come up and assumed that our external service completed, and responded before the aforementioned bean released its lock on the process instance, therefore causing a ActivitiOptimisticLockingException and resulting in the message going to an error queue

We worked around this by increasing the retries on the processing of that message, which has helped for the majority of these occurrences.

Some code used when handling the async result, which then signals the RuntimeService to continue:

private void handleExecutionComplete(Event<BaseExecutionDetail> event) {
    ExecutionComplete detail = (ExecutionComplete) event.getEventDetail();
    final Execution execution = findAndVerifyExecution(detail.getExecutionId());
    try {
        verifyExecutionState(execution, detail.getActivityId());
        Map<String, Object> variables = detail.getVariables();
        updateCoreProcessVariables(detail, variables);
        String executorCompleteVariable = format("%sComplete", detail.getExecutorName());
        variables.put(executorCompleteVariable, true);
        runtimeService.signal(execution.getId(), variables);
    } catch(ProcessStateMismatchException psme) {
        log.warn("Received ExecutionComplete event for executionId={} to resume from executor={} " +
            "but the current step of {} does not match", detail.getExecutionId(),
            psme.getExpectedActivityId(), psme.getActualActivityId());
    } catch (Exception e) {
        throw new RuntimeException(format("Exception thrown signalling activiti for executionId=%s", execution.getId()), e);
    }
  }

private Execution findAndVerifyExecution(String executionId) {
    final Execution execution = runtimeService.createExecutionQuery()
        .executionId(executionId)
        .singleResult();
    if (execution == null) {
        throw new RuntimeException(format("Execution with id %s does not exist", executionId));
    }
    return execution;
}

private void verifyExecutionState(Execution execution, String expectedActivityId) throws ProcessStateMismatchException {
    final String currentStep = execution.getActivityId();
    if (!currentStep.equals(expectedActivityId)) {
        throw new ProcessStateMismatchException(expectedActivityId, currentStep);
    }
}

Current Problem

We recently hit 1.4 million process instances in our history table (lowest hist level) and started noticing some very poor query times on the history table (via Activiti REST APIs)

We only need about a couple weeks worth which is ~150k process instance records, so I ran a script that would delete in batches of 500 via the REST API which obviously means alot of concurrent operations on the DB.

While I was running the script these exceptions went through the roof (thousands)

My questions

  1. When clearing out the DB, should I just execute SQL rather then use the Activiti REST APIs? - There is comfort in using the REST APIs as I trust the Activiti java APIs more than I trust myself

  2. My assumption is that when I was running this script the concurrency made releasing the lock take a lot longer and therefore exacerbating the ActivitiOptimisticLockingException errors - does this seem likely?

  3. Do you have any recommendations/advice on how to reduce the likelihood of these locking exceptions?

Thankfully this hasn’t caused any problems as it is all pretty resilient, and we have tools in place to re-process easily.

Appreciate any recommendations, let me know if I can clarify anything.

Hi,

You’re asking Activiti questions on Flowable forums?

Have you tried running Flowable - it’s moved on quite a bit from Activiti? Which version of Activiti?

Cheers
Paul.

Activiti 5.22.0 which is the same as Flowable 5.22.0, no?

I’m planning on upgrading to Flowable 6, soon.

Do you think this upgrade is going to alleviate my issues?

Or should I just expect this to be a consequence of how I am using it?

There are some additional fixes in Flowable, as well as Transient Variables, so no, they are not the same. It is trivial to switch to try if it’s better with Flowable 5, but you’re right to want to upgrade to 6, as it has many advantages. Meantime, I’ll see if some of the bigger brains on Flowable can make any suggestions on your specific problem.

Cheers
Paul.

1 Like

Hi,

Could you elaborate a bit more about the environment you are running this in?
Which database? How many Activiti instances are you running? Are you using the async executor? Are you using the default configuration for the async executor or did you make changes?

When the history table gets really big, then yes it can get slow. It’s a good idea to add indexes for the queries you are executing against it. Which REST call(s) are you doing to cleanup the history tables?

Best regards,

Tijs

Thanks @tijs and @PHH for responding so quickly.

Running on PostgreSQL 9.5.6 on a db.m3.xlarge

By Activiti instances are you referring to JVMs running the process engine? Right this second, I have 6 (c4.large) - but typically 1 or 2

Based on below I’m going to say no I’m not using the async executor.

@Component
public class ProcessEngineConfiguration implements ProcessEngineConfigurationConfigurer {

  @Override
  public void configure(SpringProcessEngineConfiguration pec) {
    pec.setDatabaseSchemaUpdate(DB_SCHEMA_UPDATE_TRUE);
    pec.setJobExecutorActivate(Boolean.TRUE);
    pec.setIdGenerator(new StrongUuidGenerator());
    pec.setHistory(HistoryLevel.NONE.getKey());
  }
}

For cleaning up I

  1. POST query/historic-process-instances - size: 100, finished: true
  2. For each (in parallel) - DELETE history/historic-process-instances/{processInstanceId}
  3. Repeat

Hey Reece,

Have you tried configuring and using the async executor - that will make a big difference.

Cheers
Paul.