Hi,
In my project, history tables contain millions of records in a postgres DB.
We were planning to use the Flowable history cleanup feature added with 6.5.0, but related SQL queries do not look optimized.
The first one is ok, analyzing the corresponding execution plan (thanks to the index on end_time column).
The others look inefficient. So we think about executing custom queries as a replacement.
See bellow the queries and the execution plan for the second one (from a test db)
delete
from ACT_HI_PROCINST
WHERE END_TIME_ <= ?
and END_TIME_ is not NULL
delete
from ACT_HI_ACTINST ACTINST
where ACTINST.PROC_INST_ID_ is not null
and ACTINST.PROC_INST_ID_ != ''
and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where ACTINST.PROC_INST_ID_ = PROCINST.ID_)
delete
from ACT_HI_TASKINST TASK
where TASK.PROC_INST_ID_ is not null
and TASK.PROC_INST_ID_ != ''
and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where TASK.PROC_INST_ID_ = PROCINST.ID_)
delete
from ACT_HI_IDENTITYLINK LINK
where LINK.PROC_INST_ID_ is not null
and LINK.PROC_INST_ID_ != ''
and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where LINK.PROC_INST_ID_ = PROCINST.ID_)
delete
from ACT_HI_IDENTITYLINK LINK
where LINK.TASK_ID_ is not null
and LINK.TASK_ID_ != ''
and NOT EXISTS(select TASK.ID_ from ACT_HI_TASKINST TASK where LINK.TASK_ID_ = TASK.ID_)
delete
from ACT_HI_VARINST VARINST
where VARINST.PROC_INST_ID_ is not null
and VARINST.PROC_INST_ID_ != ''
and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where VARINST.PROC_INST_ID_ = PROCINST.ID_)
delete
from ACT_HI_DETAIL HIDETAIL
where HIDETAIL.PROC_INST_ID_ is not null
and HIDETAIL.PROC_INST_ID_ != ''
and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where HIDETAIL.PROC_INST_ID_ = PROCINST.ID_)
–
Delete on act_hi_actinst actinst (cost=2801.81..60670.66 rows=1 width=12) 2 -> Hash Anti Join (cost-2001.81..60678.66 rows=1 width=12)
Hash Cond: ((actinst.proc_inst_id_)::text = (procinst.id_)::text) -> Seq Scan on act_hi_actinst actinst (cost=0.00..54897.32 rows=1005748 width=43)
Filter: proc_inst_id IS NOT NULL) AND ((proc_inst_id_)::text > "::text)) Hash (cost=1572.47. 1572.47 rows=34347 width:43)
-> Seq Scan on act_hi procinst procinst (cost%=8.00..1572.47 rows=34347 width=43)