Where is ACT_RU_ACTINST used and can i delete data from it?

narendra129 · October 28, 2020, 4:54am

Where is the table ACT_RU_ACTINST used? Can i delete data from it without affecting the running processes?

joram · October 28, 2020, 7:28pm

That table is used to keep track of active or complete activity instances, e.g. when using the ActivityInstanceQuery. It is needed for correct working of instances. Why do you want to remove data from it?

narendra129 · October 28, 2020, 7:48pm

I’ve made something of a never ending process which sends a notification sms to users at a particular time everyday. And since the process is not ending, the ended activities remain in the ACT_RU_ACTINST forever. The data keeps piling up as the process doesn’t end so I wanted to find a way to remove the ended activities from the table.

joram · October 28, 2020, 9:01pm

Ok, in that case completed activities (having a non-null endTime) could be deleted.

Note that having this data won’t impact runtime performance of the instance: the data of completed instance is never fetched (unless when querying it explicitely through the API).

narendra129 · October 29, 2020, 4:53am

Thanks for the help! I’m looking to delete data as it is generating a lot of data per day and the space is being used for no reason as i don’t need it.

wberges · October 30, 2022, 3:18pm

Hello @joram,

Adding an operational remark:

It will perhaps not impact runtime performance, but when we have several hundred thousands of workflows (with a lot of activities) in pending state as @narendra129 says (my case also), the ACT_RU_ACTINST grows drastically, because of activities but also clustered indexes which are even bigger than the data itself.

For example, for 60k running workflows (not so much for a system designed to manage long running processes), we have 87M of rows, 53Gb of data and 85Gb of indexes, for a total of +/- 140Gb…
So yes, there’s an impact, at least considering the DB file size.

It would be useful to have a way to say that we don’t want to consider the activity history, even in runtime (I don’t know the constraints: no possible compensation certainly, and others…?).
I hoped that the flowable.history-level=NONE solved this use-case, but it seems it isn’t the case.
We have so to explicitely put in place a regular purge by API to avoid such DB explosion.

HTH
Best Regards
William

wberges · November 14, 2022, 10:54am

Hello,

Thanks @joram to advise to delete the completed activities, it can be a workaround.
But if we don’t need such runtime history, it would perhaps be better to be able to not store it (as for the HI tables with the history-level)? Mainly if it isn’t needed by the core.
Because when we have a loop in the workflow, as explained, this table grows in an exponential way
It’s a pity when we don’t need this activity history.

Could we manage this table as an history table only for the running activities, meaning can we truncate it if we don’t use the history?

@joram @tijs
Thanks

joram · November 23, 2022, 9:27am

Those sizes look huge for only 60k instances. How many activity instances do you have on average per instance? I would also worry about the executions table, as the execution tree must be quite large?

The activity instances were added a long time ago as users wanted to have a fast query to see where there are at. Before that, the execution table needed to be queried and enriched with the model information at runtime, which led to bad performance for those use cases.

In theory, yes. But we don’t unit test nor qa this setup of course. You could quickly try this by replacing the default ActivityInstanceEntityManager (https://github.com/flowable/flowable-engine/blob/main/modules/flowable-engine/src/main/java/org/flowable/engine/impl/persistence/entity/ActivityInstanceEntityManager.java) with a version of yourself that doesn’t persist anything. The only thing that won’t work immediately is the historic activities (as it uses the runtime version to update), but i’m assuming you’re also running without activity history?

wberges · November 23, 2022, 12:22pm

Hello,

The problem comes when you have loops in your workflows (polling), with a timer before retry. Even if you loop on the same step, each time several new lines are added in this table, so if you have several loops with a lot of iterations, we are in my use-case. Perhaps there’s a better way to design loops (e.g. retry on failed jobs in the service task? But it supposes to throw an exception on each loop, not clean) , but not found

FYI we removed all activities except those without end date (so still running) as you recommended and we passed from 140Gb to 16Mb…
As enhancement, it would be interesting, if one is not interested by this activity history, to have at least a purge which could be executed regularly (auto or on demand).
We have the history level to NONE.

Thanks for you answer
Best regards

joram · November 29, 2022, 7:53am

I can see that this would be a setting that would be interesting in this use case. Just to clarify: this isn’t about history data (which is also why the NONE has no influence) as this is runtime data (hence the ACT_RU_x table). This is to do quick queries on runtime activities, e.g. to answer the question quickly where the process instance currently is at. Once the process instance completes, all related rows are deleted from this table. The problem here is of course that the instance never completes.

wberges · November 29, 2022, 12:47pm

Yes sure, I understood the difference between RU and HI.
But if we have set the history to NONE, it is clearly to avoid having big tables, what we have unfortunately even in this case.
BTW is this ACTINST table really needed for the execution (meaning nothing will continue to work if we empty it, mainly non ended activities) or could we envisage to fully purge it (again, we don’t use compensation which could require to rollback functionally a part of the process). The exact role and usage of this table is not really clear to me.
About the activity where we are (I don’t speak about the full path requiring ended activities), can’t we rely on the executions also?
Thanks

joram · November 29, 2022, 12:53pm

If that would have been the purpose we would have added a flag avoidLargeTables ;-). I understand that’s your goal, but this data was never about history.
Your use case of a forever or very-long looping process is not that common one, that’s most likely why it hasn’t been prioritized to implement a flag.

The cheesy answer is that of course we would accept PR’s that would do this. I don’t know from the top of my head what the impact might be to fully disable it or purge it all the time, that would require investigation. For starters, the history would break (not your problem, but it would be for other users).

You can, but it’ll be a slow query.

wberges · November 29, 2022, 12:57pm

Most of the time, we need to know where we are only to display the process diagram, so no problem if it is “slow”. But perhaps there are some other use cases?
Again, if it was possible to know in which context this table is mandatory, it would allow us to know if we can empty (truncate) it or not.

wberges · November 29, 2022, 12:59pm

In all cases, thanks a lot for your answers @joram, highly appreciated

wberges · November 29, 2022, 1:15pm

BTW I forgot 1 important thing: we have such big history because we have processes with loops running since several months. These processes were supposed to have been stopped, what wasn’t the case due to this other issue: How to delete a process? - #4 by wberges
So when we combine both, we have 140Gb in ACTINST

rangedzs · January 18, 2023, 8:58am

FYI we removed all activities except those without end date (so still running) as you recommended and we passed from 140Gb to 16Mb…

rangedzs · January 27, 2023, 4:00pm

Most of the time, we need to know where we are only to display the process diagram, so no problem if it is “slow”. But perhaps there are some other use cases?.

Topic		Replies	Views
Can I remove all data in the table act_hi_actinst? Flowable Engine	6	5917	May 7, 2017
Delete process instance by SQL Flowable Engine	2	328	December 29, 2023
How do I clean up the data in the "act_ru*" table when I cancel the process? Flowable Engine	0	277	April 14, 2023
Process was end but records on database this process are in table act_ru_actinst Flowable Engine	2	486	December 8, 2020
Removing data from ACT_HI_DETAIL securely Flowable Engine	3	2308	April 17, 2020

Where is ACT_RU_ACTINST used and can i delete data from it?

Related topics