I am using Flowable 5.23 with a system with ~230K tasks resident and am perform searches using the task search API and seeing performance that is quite a bit slower than I’d expect or hope.
First, I’m doing a pretty basic search for tasks assigned to a user or candidates for them (directly or through a group) and as this is supporting a UI that is able to make paginated requests, asking for just the first 25. The generated SQL I see from turning on debug logging is:
select distinct RES.*
from ACT_RU_TASK RES
left join ACT_RU_IDENTITYLINK I on I.TASK_ID_ = RES.ID_
WHERE (RES.ASSIGNEE_ = ‘user1’
or (RES.ASSIGNEE_ is null
and I.TYPE_ = ‘candidate’
and (I.USER_ID_ = ‘user1’
or I.GROUP_ID_ IN (‘group1’) )))
order by RES.ID_ asc
LIMIT 25 OFFSET 0
;
This looks perfectly benign, but takes 4-5 seconds to execute.
The issue seems to be the order by and MySQL having to create a temporary table to sort it as removing the order by results in a response in 20-30 milliseconds.
The problem is, I’m not setting an order by, this seems to be the default.
So, is there a way to get the order by to not be included?
Now, in reality I probably do want it ordered, and perhaps on other columns, but those queries are just as slow or slower, for example ordering on CREATE_TIME_ results in the query time going just over 5 seconds.
Are others not using MySQL and/or not using it with this many tasks or with the task search API like this? Or is this normal?
Now worse, at times I need to do searches on some process variables. For instance, I’d like to get tasks that meet the above criteria, but also have a score variable with a value of at least 45. This is easy enough to do with the task search API, I can just use the processVariableValueGreaterThan method. The resulting SQL is:
select distinct RES.*
from ACT_RU_TASK RES
left join ACT_RU_IDENTITYLINK I on I.TASK_ID_ = RES.ID_
inner join ACT_RU_VARIABLE A0 on RES.PROC_INST_ID_ = A0.PROC_INST_ID_
WHERE A0.TASK_ID_ is null
and A0.NAME_= ‘score’
and A0.TYPE_ = ‘double’
and A0.DOUBLE_ > 45.0
and (RES.ASSIGNEE_ = ‘user1’
or (RES.ASSIGNEE_ is null
and I.TYPE_ = ‘candidate’
and (I.USER_ID_ = ‘user1’
or I.GROUP_ID_ IN ('group1) )))
order by RES.ID_ asc
LIMIT 25 OFFSET 0
;
This looks like you’d expect, but takes a whopping 52 seconds.
This again is caused by the order by, removing it results in a response is about 50 milliseconds.
So it seems MySQL just does really poorly with queries that need to be ordered when the query is a little complex and it decides it can’t use an index to do the ordering.
Has anyone else seen this behavior? If so, how have you addressed it?
Or are there any suggestions on how to use the task search API and get good performance in this scenario? Or am I stuck coming up with my own SQL generation/queries that craft SQL that performs better, perhaps by avoiding order by?