Integration with Cassandra

We are looking to integrate Flowable BPMN with NoSQL especially Cassandra. Please direct me to any docs or tutorials. Or any way to try this.

Thanks in advance

Depends on what you want to do:

  • do you want to swap out the whole relational persistence with cassandra? If so, this is theoretically possible (all DataManagers are interfaces and can be swapped out), but the hard problem to solve is transactionality and concurrency in process. There’s nothing out of the box done yet in this area.

  • do you want to store the historical data in Cassandra, then you could look into using the new async history (see https://github.com/flowable/flowable-examples/tree/master/async-history/async-history-rabbitmq-cfg)

1 Like

Thanks for responding.
We are using Cassandra for our Data persistence. Ideally we want both Workflow and our application run on same data store instead of two separate, one relational and other non-relational. Is there a road map for any NoSQL storage, which serves as a sample to roll out custom NoSQL persistence in workflow.

Thanks in Advance.

No, sadly nothing concrete yet. The storage is not the problem, it’s more guaranteeing the transactional semantics that the engine depends on (and what it gets for free from a relational db). The persistence itself is pluggable: there are various DataManager interfaces that can be swapped out with another implementation (see https://github.com/flowable/flowable-engine/tree/master/modules/flowable-engine/src/main/java/org/flowable/engine/impl/persistence/entity/data).

@cvramanarao Have you tried the data persistence for the runtime data with cassandra.

@joram I am also trying to explore the same on Hbase. However in the absence of transaction guarantees by the underlying DB, can you please help in understanding the behaviour of process engine.

That’s a very broad question. Which part are you specifically interested in?
Without transaction guarantees, you will need to implement lots of ‘compensating logic’ when things go wrong. This is not an easy task.

@joram Thanks for replying back.

I want to increase the overall throughput of the system build on top of Flowable Engine, which I think might be bottlenecked by the Mysql/relational DB that flowable uses.

Hence just a thought, that can I store the entire execution data (runtime tasks, variables, executions, etc) again an indexed rowKey(processInstanceId in this case)? This way I need not rely on the transactional guarantees of relational DB and overall throughput can be increased.

I also explored flowable-mongodb, but since it is in alpha release and was never merged to master, It is not giving me enough confidence to use mongo. Secondly, flowable mongo is just creating one-one mapping of all the relational tables to corresponding collection, which is not adhering to NoSQL semantics.

I can provide more details about the application architecture, if you think its relevant.

Have you done some benchmarking for this?

You can do that, however note that you will have to solve conflicts yourself. E.g. what happens when two users at the same time complete a task at the very same moment? The execution tree, as stored in multiple rows in the relational database, has been optimised for minimising conflicts for those use cases. When there is such a conflict happening (e.g. a variable is comitted twice at the very same moment), we rely on the transactional rollback and atomicity of the relational database to make sure the data is not corruped.