Stuck Process Instances - How to Find them?

It’s happened pretty rarely, but I have 3 separate occasions where I have a workflow that appears to have gotten stuck before it completed properly.

In the first instance, the cause of the sticky workflow turns out to be that the server got rebooted/went down. Since the server runs the workflow engine, pretty clear that what happened is the server went down and the workflow never had a chance to finish and just got stuck where it was.

The second instance showed a similar pattern but there was no reported server outtage. I went into the ACT_RU_ACTINST table and I could see that it paused right at the point where it was going to invoke an HTTP service call, but no record of the HTTP service call existed in either ACT_RU_ACTINST or ACT_HI_ACTINST. That said, I could see that the server call was invoked by flowable, because the data was altered in such at way to indicate the call did indeed happen and the changes made by the web call were committed.

The third situation, I have the workflow paused on a user task. But I have a record in ACT_HI_VARINST that shows the user approved the task. I also have data showing that HTTP services that fire after the task is approved were invoked (because the data was edited). Yet I have no history in either ACT_RU_ACTINST or ACT_HI_ACTINST showing that the task was completed or that a failure occurred. It’s like Flowable just decided to stop the transaction and roll it back without a clear indication of why. I have no reported server outtage on this situation, either.

My question, as I dig into how this happened: is there a way to recover the history of these workflows somewhere? Is the transaction that’s about to be committed stored somewhere I can view it, or is it totally rolled back and lost to the ether? Is there a setting I can turn on so I can see failed transactions in a history table somewhere?

Alternatively, is there a way to get Flowable to resume workflows that were suspended when the engine got stalled? If I could get flowable to pick up where it left off on these workflows, I wouldn’t have a problem if the server went down on me.

For reference, I’m running Flowable version 6.6.2.2 (I compiled from source between the 6.6 and 6.7 builds).