Where is the table ACT_RU_ACTINST used? Can i delete data from it without affecting the running processes?
That table is used to keep track of active or complete activity instances, e.g. when using the ActivityInstanceQuery. It is needed for correct working of instances. Why do you want to remove data from it?
I’ve made something of a never ending process which sends a notification sms to users at a particular time everyday. And since the process is not ending, the ended activities remain in the ACT_RU_ACTINST forever. The data keeps piling up as the process doesn’t end so I wanted to find a way to remove the ended activities from the table.
Ok, in that case completed activities (having a non-null endTime) could be deleted.
Note that having this data won’t impact runtime performance of the instance: the data of completed instance is never fetched (unless when querying it explicitely through the API).
Thanks for the help! I’m looking to delete data as it is generating a lot of data per day and the space is being used for no reason as i don’t need it.
Adding an operational remark:
It will perhaps not impact runtime performance, but when we have several hundred thousands of workflows (with a lot of activities) in pending state as @narendra129 says (my case also), the ACT_RU_ACTINST grows drastically, because of activities but also clustered indexes which are even bigger than the data itself.
For example, for 60k running workflows (not so much for a system designed to manage long running processes), we have 87M of rows, 53Gb of data and 85Gb of indexes, for a total of +/- 140Gb…
So yes, there’s an impact, at least considering the DB file size.
It would be useful to have a way to say that we don’t want to consider the activity history, even in runtime (I don’t know the constraints: no possible compensation certainly, and others…?).
I hoped that the flowable.history-level=NONE solved this use-case, but it seems it isn’t the case.
We have so to explicitely put in place a regular purge by API to avoid such DB explosion.
Thanks @joram to advise to delete the completed activities, it can be a workaround.
But if we don’t need such runtime history, it would perhaps be better to be able to not store it (as for the HI tables with the history-level)? Mainly if it isn’t needed by the core.
Because when we have a loop in the workflow, as explained, this table grows in an exponential way
It’s a pity when we don’t need this activity history.
Could we manage this table as an history table only for the running activities, meaning can we truncate it if we don’t use the history?
Those sizes look huge for only 60k instances. How many activity instances do you have on average per instance? I would also worry about the executions table, as the execution tree must be quite large?
The activity instances were added a long time ago as users wanted to have a fast query to see where there are at. Before that, the execution table needed to be queried and enriched with the model information at runtime, which led to bad performance for those use cases.
In theory, yes. But we don’t unit test nor qa this setup of course. You could quickly try this by replacing the default ActivityInstanceEntityManager (flowable-engine/ActivityInstanceEntityManager.java at main · flowable/flowable-engine · GitHub) with a version of yourself that doesn’t persist anything. The only thing that won’t work immediately is the historic activities (as it uses the runtime version to update), but i’m assuming you’re also running without activity history?
The problem comes when you have loops in your workflows (polling), with a timer before retry. Even if you loop on the same step, each time several new lines are added in this table, so if you have several loops with a lot of iterations, we are in my use-case. Perhaps there’s a better way to design loops (e.g. retry on failed jobs in the service task? But it supposes to throw an exception on each loop, not clean) , but not found
FYI we removed all activities except those without end date (so still running) as you recommended and we passed from 140Gb to 16Mb…
As enhancement, it would be interesting, if one is not interested by this activity history, to have at least a purge which could be executed regularly (auto or on demand).
We have the history level to NONE.
Thanks for you answer
I can see that this would be a setting that would be interesting in this use case. Just to clarify: this isn’t about history data (which is also why the NONE has no influence) as this is runtime data (hence the ACT_RU_x table). This is to do quick queries on runtime activities, e.g. to answer the question quickly where the process instance currently is at. Once the process instance completes, all related rows are deleted from this table. The problem here is of course that the instance never completes.
Yes sure, I understood the difference between RU and HI.
But if we have set the history to NONE, it is clearly to avoid having big tables, what we have unfortunately even in this case.
BTW is this ACTINST table really needed for the execution (meaning nothing will continue to work if we empty it, mainly non ended activities) or could we envisage to fully purge it (again, we don’t use compensation which could require to rollback functionally a part of the process). The exact role and usage of this table is not really clear to me.
About the activity where we are (I don’t speak about the full path requiring ended activities), can’t we rely on the executions also?
If that would have been the purpose we would have added a flag avoidLargeTables ;-). I understand that’s your goal, but this data was never about history.
Your use case of a forever or very-long looping process is not that common one, that’s most likely why it hasn’t been prioritized to implement a flag.
The cheesy answer is that of course we would accept PR’s that would do this. I don’t know from the top of my head what the impact might be to fully disable it or purge it all the time, that would require investigation. For starters, the history would break (not your problem, but it would be for other users).
You can, but it’ll be a slow query.
Most of the time, we need to know where we are only to display the process diagram, so no problem if it is “slow”. But perhaps there are some other use cases?
Again, if it was possible to know in which context this table is mandatory, it would allow us to know if we can empty (truncate) it or not.
In all cases, thanks a lot for your answers @joram, highly appreciated
BTW I forgot 1 important thing: we have such big history because we have processes with loops running since several months. These processes were supposed to have been stopped, what wasn’t the case due to this other issue: How to delete a process? - #4 by wberges
So when we combine both, we have 140Gb in ACTINST