Background
In our setup of Activiti + Spring we map all of our steps (all marked as async) in a process definition to the same java bean, that bean publishes a message to be processed asynchronously and that execute(ActivityExection)
method ends.
Once the external service is complete, a message is sent back and consumed by a JMS listener in the same Activiti instance (our custom listener) - which resumes the process (some code snippets of that below)
We have previously seen this exception come up and assumed that our external service completed, and responded before the aforementioned bean released its lock on the process instance, therefore causing a ActivitiOptimisticLockingException
and resulting in the message going to an error queue
We worked around this by increasing the retries on the processing of that message, which has helped for the majority of these occurrences.
Some code used when handling the async result, which then signals the RuntimeService to continue:
private void handleExecutionComplete(Event<BaseExecutionDetail> event) {
ExecutionComplete detail = (ExecutionComplete) event.getEventDetail();
final Execution execution = findAndVerifyExecution(detail.getExecutionId());
try {
verifyExecutionState(execution, detail.getActivityId());
Map<String, Object> variables = detail.getVariables();
updateCoreProcessVariables(detail, variables);
String executorCompleteVariable = format("%sComplete", detail.getExecutorName());
variables.put(executorCompleteVariable, true);
runtimeService.signal(execution.getId(), variables);
} catch(ProcessStateMismatchException psme) {
log.warn("Received ExecutionComplete event for executionId={} to resume from executor={} " +
"but the current step of {} does not match", detail.getExecutionId(),
psme.getExpectedActivityId(), psme.getActualActivityId());
} catch (Exception e) {
throw new RuntimeException(format("Exception thrown signalling activiti for executionId=%s", execution.getId()), e);
}
}
private Execution findAndVerifyExecution(String executionId) {
final Execution execution = runtimeService.createExecutionQuery()
.executionId(executionId)
.singleResult();
if (execution == null) {
throw new RuntimeException(format("Execution with id %s does not exist", executionId));
}
return execution;
}
private void verifyExecutionState(Execution execution, String expectedActivityId) throws ProcessStateMismatchException {
final String currentStep = execution.getActivityId();
if (!currentStep.equals(expectedActivityId)) {
throw new ProcessStateMismatchException(expectedActivityId, currentStep);
}
}
Current Problem
We recently hit 1.4 million process instances in our history table (lowest hist level) and started noticing some very poor query times on the history table (via Activiti REST APIs)
We only need about a couple weeks worth which is ~150k process instance records, so I ran a script that would delete in batches of 500 via the REST API which obviously means alot of concurrent operations on the DB.
While I was running the script these exceptions went through the roof (thousands)
My questions
-
When clearing out the DB, should I just execute SQL rather then use the Activiti REST APIs? - There is comfort in using the REST APIs as I trust the Activiti java APIs more than I trust myself
-
My assumption is that when I was running this script the concurrency made releasing the lock take a lot longer and therefore exacerbating the ActivitiOptimisticLockingException errors - does this seem likely?
-
Do you have any recommendations/advice on how to reduce the likelihood of these locking exceptions?
Thankfully this hasn’t caused any problems as it is all pretty resilient, and we have tools in place to re-process easily.
Appreciate any recommendations, let me know if I can clarify anything.