Recently we started experiencing blocking errors when completing statuses for task activities which are followed by call activities; this happens for process instances started before the upgrade from v5 to v6 and only started happening after the existing process definitions have been redeployed for v6 compatibility.
Before the redeploy, those processes kept running fine even after the upgrade. Could it be that starting a sub process in V6 from a V5 running process is not supported and if this is the case, which are the options to solve this issue?
suspend the newly deployed process definition again so the call activity will use the v5 version again?
this didn’t seem to work immediately, but might need better testing
would this be a possible solution and is it advisable?
the redeployed process definitions are exactly the same as before, only redeployed for v6 compatibility
Indeed. Looking through the v5 and v6 CallActivityBehavior, there is no handling of that use case. Furthermore. matching those two execution trees on each other would be quite difficult.
Migration only works on v6 instances. If I’m understanding you correctly, you’re doing a v5 → v6 call, right?
An alternative would be to deploy the called process as v5 again.
You can do this by setting
Thanks for your response. In the meanwhile we made some progress. It appears those processes were stuck in the engines which were not restarted since the deploy of the v6Migration only works on v6 instances call activity sub process…
To clarify:
everything is fine running the v5 process definitions
redeploying the existing process definition to make them v6 ready
starting v6 process instances which start call activities also v6 ready → all ok
updating still running v5 processes which try to start call activities of v6 deployed definitions → error above
and now: apparently when we restart the engine after the new deploys, older running instances do start newer call activities correctly
We can’t find any reason for this, but it seems reproducable
Migration only works on v6 instances
noted
An alternative would be to deploy the called process as v5 again.
We were already looking into this option as well, while using another process definition key to prevent newer processes to continue starting with the older definition.
We’ll validate everything finally tomorrow and will give an update here if something additionally is found
We have an extra update on this issue which seems rather interesting.
Again this issue reflects release 6.4.0 with the Flowable5Compatibility flag enabled, and now we are observing following behavior:
As I wrote in my previous update, restarting the engine seems to solve the issue, but this is not entirely true. It just brings the engine in a clean state, meaning that the next request will determine which call activities will work.
Let say we have main process A which calls sub process B. Both have been deployed in the V5 engine before the upgrade and redeployed in V6. So we process definitions A1, A2, B1 and B2. We start following situations, each from a clean started flowable-rest in tomcat
Situation 1:
Updating some A1 instances which were started in the past, will now start call activity B2 which works fine. Afterwards we’ll try to start a new process A2, and when we update that process to let it flow into the same call activity, this fails with following error
No start element found for process definition B:2:d34cd5d3-d479-11e9-9ad5-005056b39f1e
Situation 2:
Starting a new A2 process instance and letting it flow into the B2 call activity will work fine. But when trying to update an older running A1 instance to let it flow into the same call activity, this fails with the earlier mentioned cast exception
couldn’t execute activity <callActivity id="startB" …>: org.flowable.engine.impl.persistence.entity.ProcessDefinitionEntityImpl cannot be cast to org.activiti.engine.impl.persistence.entity.ProcessDefinitionEntity
Of course we’ll now deploy the newer ones with another process definition key to avoid these conflicts. But we do already have some processes which have been started in between and it al concerns long running flows.
Any idea about these errors, what could be the root cause of this how it could be solved?
Situation 1 is really strange. So the A1 → B2 works correctly but A2->B2 doesn’t work correctly anymore afterwards?
yes, unless you do the other one first, then the first situation won’t work; the 2 error messages are mentioned. So yes the call activity process has a start element and it work find from the other engine at that moment…
it’s also nothing in database, since in a load balanced setup, you could have situation 1 on one node and situation 2 on another node. Restarting the engine clears that ‘state’.
Can you clarify what you mean by that? What do you exactly update?
It could be completing a user task for instance… which is it mainly in this customer’s situation
Afterwards the process will flow to a next step which could be a call activity that has this problem
What happens if the key is manually changed for those old process definitions(maybe even in the database)? Can you test these on a test setup?
Can you explain? For new process this is no concern for us, a new deploy is perfectly acceptable.
But are you also saying it would be a solution for the running process by changing the key in the database. This would be really helpful! In which table should I start looking then?
We have for instance about 10 to 20 running processes since the “harmful” deploy to v6. If we can fix those running processes which already use another process definition id, the old ones in v5 would run completely isolated with their own process definitions, so will the v6 processes
You’d need to change it in the ACT_RU_EXECUTION table, column is PROC_DEF_ID_. That is the id of the process definition (it is also a foreign key) which is stored in the ACT_RE_PROCDEF table. Note you need to change all rows relating to the process instance id (they should share a PROC_INST_ID_ value). And of course, make sure to take backups (or try it out on a copy of the db) :-).