Cleaning old hisory data = OutOfMemory

Hi
We are using Flowable 6.7.2 and currently we heve about 700mln of processes. We are trying to delete old ones(older than 356 days)=0.5 mln. We are trying by following code

HistoricProcessInstanceQuery query = Flowable.getHistoryCleaningManager().createHistoricProcessInstanceCleaningQuery();|
query.finishedBefore(date.addDays(-356));
query.deleteSequentiallyUsingBatch(batch_size_500,'batch_name')

But we got OutOfMemory on out production env (7 machines,4 thread per machine for flowable AsynchExecutor)

As you can see on screenshots there are two object with 700MB why this happens? Why deleting of old processes neds so many memory?

Maybe problem is in transaction scope which is too “wide” for DeleteHistoricProcessInstanceIdsJobHandler

Object are hold by DbSession

Hey @MirekSz,

Have you tried 6.8.0? We’ve done some improvements around the performance and memory consumption of the cleanups.

Cheers,
Filip

Not yet, do you have some link(github/jira) ot this improvement?

No there is no issue, but there is the history of the job handler you mentioned (History for modules/flowable-engine/src/main/java/org/flowable/engine/impl/delete/DeleteHistoricProcessInstanceIdsJobHandler.java - flowable/flowable-engine · GitHub)

I will try, but after analyzing code I don’t see any change in transaction scope. There should be one transaction which get batch data (ids to delete, for example 1000) and then 1000 of small transaction. Now is one huge transaction…

If you want to have 1000 small transactions then you should use a batch size of 1. That will of course be slower. In 6.8 we improved the delete to do less calls to the DB so it should use less memory.

Thanks, I will decrease batch size from 1000 to 100 and check memory consumption because upgrading Flowable on production is not so easy process

I have inspected 6.7.2 source code and for us problem is in implementation of method HistoricVariableInstanceEntityManagerImpl.deleteHistoricVariableInstanceByProcessInstanceId (which have’t changed in 6.8.0)

    @Override
    public void deleteHistoricVariableInstanceByProcessInstanceId(final String historicProcessInstanceId) {
        if (serviceConfiguration.isHistoryLevelAtLeast(HistoryLevel.ACTIVITY)) {
            List<HistoricVariableInstanceEntity> historicProcessVariables = dataManager.findHistoricVariableInstancesByProcessInstanceId(historicProcessInstanceId);
            for (HistoricVariableInstanceEntity historicProcessVariable : historicProcessVariables) {
                delete(historicProcessVariable);
            }
        }
    }

Why deleting requires reading objects to memory, these objects are not read anywhere. (and they can be heavy)
Instend of loop maybe it should be something like this

dataManager.deleteHistoricVariableInstancesByProcessInstanceId(historicProcessInstanceId);

Of course, there are many places where objects are deleted this way, but deleting variables this way is the most dangerous

The reason why they are deleted like this is due to the fact that those variables might have references to byte arrays. Therefore, they are deleted in that way.

Yes the method has not changed, but if you look at the code that is executed in 6.8 you will see that the HistoricVariableInstanceEntityManagerImpl#bulkDeleteHistoricVariableInstancesByProcessInstanceIds method is used for the deletion. This method is optmized to properly delete things without loading them in memory.

Cheers,
Filip

hey,
I would like to let you know that we have updated our envs from 6.7.2 to Flowable 6.8 and the improvment is massive.Time of deleting 5mln of processes was reduced from 624h (about ~26 days!!!) to 3.5h - amazing. Kudos for programmers
We can sleep nicely again:-)

Thanks for letting us know @MirekSz. Happy to hear that the speed up works good for you.

Cheers,
Filip