Flowable large history clean up

Hi,

In my project, history tables contain millions of records in a postgres DB.
We were planning to use the Flowable history cleanup feature added with 6.5.0, but related SQL queries do not look optimized.
The first one is ok, analyzing the corresponding execution plan (thanks to the index on end_time column).
The others look inefficient. So we think about executing custom queries as a replacement.

See bellow the queries and the execution plan for the second one (from a test db)

 delete
 from ACT_HI_PROCINST
 WHERE END_TIME_ <= ?
  and END_TIME_ is not NULL
 
 delete
 from ACT_HI_ACTINST ACTINST
 where ACTINST.PROC_INST_ID_ is not null
  and ACTINST.PROC_INST_ID_ != ''
  and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where ACTINST.PROC_INST_ID_ = PROCINST.ID_)
 
 delete
 from ACT_HI_TASKINST TASK
 where TASK.PROC_INST_ID_ is not null
  and TASK.PROC_INST_ID_ != ''
  and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where TASK.PROC_INST_ID_ = PROCINST.ID_)
 
 delete
 from ACT_HI_IDENTITYLINK LINK
 where LINK.PROC_INST_ID_ is not null
  and LINK.PROC_INST_ID_ != ''
  and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where LINK.PROC_INST_ID_ = PROCINST.ID_)
 
 delete
 from ACT_HI_IDENTITYLINK LINK
 where LINK.TASK_ID_ is not null
  and LINK.TASK_ID_ != ''
  and NOT EXISTS(select TASK.ID_ from ACT_HI_TASKINST TASK where LINK.TASK_ID_ = TASK.ID_)
 
 delete
 from ACT_HI_VARINST VARINST
 where VARINST.PROC_INST_ID_ is not null
  and VARINST.PROC_INST_ID_ != ''
  and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where VARINST.PROC_INST_ID_ = PROCINST.ID_)
 
 delete
 from ACT_HI_DETAIL HIDETAIL
 where HIDETAIL.PROC_INST_ID_ is not null
  and HIDETAIL.PROC_INST_ID_ != ''
  and NOT EXISTS(select PROCINST.ID_ from ACT_HI_PROCINST PROCINST where HIDETAIL.PROC_INST_ID_ = PROCINST.ID_)

Delete on act_hi_actinst actinst (cost=2801.81..60670.66 rows=1 width=12) 2 -> Hash Anti Join (cost-2001.81..60678.66 rows=1 width=12)

    Hash Cond: ((actinst.proc_inst_id_)::text = (procinst.id_)::text) -> Seq Scan on act_hi_actinst actinst (cost=0.00..54897.32 rows=1005748 width=43)

    Filter: proc_inst_id IS NOT NULL) AND ((proc_inst_id_)::text > "::text)) Hash (cost=1572.47. 1572.47 rows=34347 width:43)

    -> Seq Scan on act_hi procinst procinst (cost%=8.00..1572.47 rows=34347 width=43)

You’re free to execute any queries yourself, of course :wink:

The historic cleanup is meant for periodic cleanup, not for doing large bulk deletes. Every index we add has a cost on update/insert, so we need to carefully evaluete each of them.