Failover / redundancy / scalability / recovery

Hi,

I am evaluating flowable and other Java bpm engines for our firm in order to decide which one would be best suited for our needs. Failover / redundancy / scalability / disaster recovery are important criteria for us. I am new to flowable and I have reviewed the docs on the website and installed and successfully run the examples both embedded as well as in Tomcat, however, I didn’t find information on the above areas so I have some specific questions:

  1. What sort of failover capabilities are there in flowable? If the JVM where the engine is running becomes unavailable (e.g., due to an OutOfMemory error) is it possible to failover to another engine and continue the process without restarting it or repeating the previously completed tasks?

  2. Similar to above, if we detect that tasks in one vm are falling behind due to high volume is it possible to offload the remaining tasks in an engine in another vm, again without restarting the process or repeating completed tasks?

  3. Does flowable support farming out tasks to remote vms? For example, if we want to execute a data oriented task in a remote vm, possibly because data is already cached there, is that possible? I understand I can do this using custom code where the service task class makes a remote call via REST or socket, but short of writing custom code, does flowable offer support for this, maybe using embedded agents, etc?

  4. The docs mention that a transaction starts at a wait state and ends at the next wait state. What happens if there is a failure in the transaction? How does flowable recover?

  5. Finally, is there any whitepaper, presentation, etc. available that provides more information about failover, redundancy, scalability and recovery issues?

Any help would be appreciated. Thank you
Subhra

Hi Subhra,

  1. Yes, when using asynchronous tasks the Flowable job mechanism will make sure that another engine will execute the job. Then no restart or repeating of previous tasks is necessary.

  2. With load balancing this case should never happen. The tasks should be evenly distributed across the different nodes. But if there would be an issue, then the engine processes the jobs with the volume it can handle. So other engines with more CPU available will just pick up more jobs and execute them.

  3. Flowable has an out-of-the-box HTTP/REST task that can be configured to invoke a REST service without custom coding. There’s also the Shell task to execute a OS native command that could be used for this use case.

  4. When there is a failure in the transaction, the transaction is rolled back to the previous wait state. By using asynchronous tasks you can make the transactions short lived to just one task.

  5. Like explained in the answers above Flowable scalability and failover support is not too complex and the user guide provides the best resource for this. But there’s not more to it then what I described above.

Best regards,

Tijs

3 Likes

Hi Tijs,

Thank you for the clarifications. Those were very helpful.

Your docs mention that you’ve bench-marked the asynchronous engine vs the synchronous one and found the asynchronous engine outperforms it. Can you please provide me with the benchmark data?

Thank you
Subhra

Hi, Do we have any document reference for active/active or active/passive configuration for flowable? Please advise. I could not find it in the docs.