Integration with Cassandra

cvramanarao · July 27, 2017, 6:15pm

We are looking to integrate Flowable BPMN with NoSQL especially Cassandra. Please direct me to any docs or tutorials. Or any way to try this.

Thanks in advance

joram · July 28, 2017, 9:30am

Depends on what you want to do:

do you want to swap out the whole relational persistence with cassandra? If so, this is theoretically possible (all DataManagers are interfaces and can be swapped out), but the hard problem to solve is transactionality and concurrency in process. There’s nothing out of the box done yet in this area.
do you want to store the historical data in Cassandra, then you could look into using the new async history (see https://github.com/flowable/flowable-examples/tree/master/async-history/async-history-rabbitmq-cfg)

cvramanarao · July 28, 2017, 2:38pm

Thanks for responding.
We are using Cassandra for our Data persistence. Ideally we want both Workflow and our application run on same data store instead of two separate, one relational and other non-relational. Is there a road map for any NoSQL storage, which serves as a sample to roll out custom NoSQL persistence in workflow.

Thanks in Advance.

joram · August 4, 2017, 8:21am

No, sadly nothing concrete yet. The storage is not the problem, it’s more guaranteeing the transactional semantics that the engine depends on (and what it gets for free from a relational db). The persistence itself is pluggable: there are various DataManager interfaces that can be swapped out with another implementation (see https://github.com/flowable/flowable-engine/tree/master/modules/flowable-engine/src/main/java/org/flowable/engine/impl/persistence/entity/data).

anilpurohit710 · April 1, 2020, 11:28am

@cvramanarao Have you tried the data persistence for the runtime data with cassandra.

@joram I am also trying to explore the same on Hbase. However in the absence of transaction guarantees by the underlying DB, can you please help in understanding the behaviour of process engine.

joram · April 13, 2020, 11:23am

That’s a very broad question. Which part are you specifically interested in?
Without transaction guarantees, you will need to implement lots of ‘compensating logic’ when things go wrong. This is not an easy task.

anilpurohit710 · April 14, 2020, 4:43am

@joram Thanks for replying back.

I want to increase the overall throughput of the system build on top of Flowable Engine, which I think might be bottlenecked by the Mysql/relational DB that flowable uses.

Hence just a thought, that can I store the entire execution data (runtime tasks, variables, executions, etc) again an indexed rowKey(processInstanceId in this case)? This way I need not rely on the transactional guarantees of relational DB and overall throughput can be increased.

I also explored flowable-mongodb, but since it is in alpha release and was never merged to master, It is not giving me enough confidence to use mongo. Secondly, flowable mongo is just creating one-one mapping of all the relational tables to corresponding collection, which is not adhering to NoSQL semantics.

I can provide more details about the application architecture, if you think its relevant.

filiphr · April 14, 2020, 7:12am

Have you done some benchmarking for this?

joram · April 20, 2020, 10:08am

You can do that, however note that you will have to solve conflicts yourself. E.g. what happens when two users at the same time complete a task at the very same moment? The execution tree, as stored in multiple rows in the relational database, has been optimised for minimising conflicts for those use cases. When there is such a conflict happening (e.g. a variable is comitted twice at the very same moment), we rely on the transactional rollback and atomicity of the relational database to make sure the data is not corruped.

Topic		Replies	Views
Any support for alternative persistence such as GraphDB? Flowable Engine	12	2508	June 27, 2017
No Sql database Flowable Application	1	696	February 16, 2018
Persisting async history data in NoSQL DB Flowable Engine	5	2002	September 6, 2018
Decouple History to No SQL database Flowable Engine	4	55	August 18, 2025
Using existing database as a single source of truth	4	413	September 16, 2020

Integration with Cassandra

Related topics