Bad performance with async service tasks

Hello!

In our project we try to achieve the maximum performance with the Flowable engine and have some problems with asynchronous serviceTask mode.

We have created a very simple BPMN definition with one step:

    <?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
             xmlns:flowable="http://flowable.org/bpmn"
             typeLanguage="http://www.w3.org/2001/XMLSchema"
             expressionLanguage="http://www.w3.org/1999/XPath"
             targetNamespace="http://www.flowable.org/processdef">

  <process id="sequence1Async" name="Sequence workflow 1 async" isExecutable="true">

    <startEvent id="startEvent" />
    <serviceTask id="externalSystemCall" name="Test" flowable:delegateExpression="${simpleStop}" **flowable:async="true"** />
    <endEvent id="approveEnd" />

    <sequenceFlow sourceRef="startEvent" targetRef="externalSystemCall" />
    <sequenceFlow sourceRef="externalSystemCall" targetRef="approveEnd" />

  </process>

</definitions>

(simpleStop delegate only writes executionId and processInstanceId to the log and does nothing else)

And then we run a batch of requests (50 threads with 1000-10000 loop counts) with Apache JMeter. Flowable and PostgreSQL run on the same machine inside Docker with 8 CPU, 17 Gb of RAM and 128 Gb of disk space. Flowable history is disabled. We run process instances with 3 variables of type: a string, a hashmap with one key-value pair and an empty array.

We calculate the average number of completed processes per second during the test.

We have found that when flowable:async=“true” is omitted, we achieve the result of 1089 processes per second but with async mode we can only get 72. The questions is: is this a normal behavior? Could we tune Flowable to achieve better performance in async mode (We have tried to increase spring.task.execution.pool.coreSize - Async Task Executor pool size - but that does not help)? We understand that the async mode adds extra operations to the process execution (an extra write of the process state to the database and an extra read from it) but that does not seem to be so resource-expensive.

Thank you!

1 Like

Hi @VladimirKhil,
how long is the test execution in total? When you don’t test it long enough, I could imagine that you have a lot of (time) overhead by the wait time until the async executor is picking up the jobs.

Valentin

Hey @VladimirKhil,

Thanks for writing to us here. Is it possible for you to share the project and setup you are using for testing this?

How are you counting the completed processes per second during the test?

How long is the external system call usually taking?

The async flag shouldn’t slow down the engine that much.

@valentin regarding:

By default when there is place in the queue the async job would scheduled for execution in a post commit. This means that the async executor wait time shouldn’t matter.

However, if the queue capacity is full, then the jobs would be rejected and picked up on the next async acquisition. The default queue capacity in Spring Boot is unlimited I think.

Cheers,
Filip

Hi!

We run the test for 50 000 processes (it takes some minutes). Async executor prewarm time last for several seconds. After that it starts picking jobs. It uses only 8 thread by default (maybe because there only 8 CPUs).

Hello!
I will try to make a simple example. Is there any built-in JavaDelegate implementations inside Flowable (it would be much easier to make an example without injecting extra jars)? I have tried org.flowable.engine.impl.test.NoOpServiceTask but it fails.

We count the number of completed process instances in log file. I have omitted the executionListener that writes to log and forgot that that is important.

“ExternalSystemCall” is bad name for this example, yeah. It really does nothing except than writing a message to the log file.

There are no built-int JavaDelegate implementations. However, you can use an expression ${true} or take the spring-boot-example update it to 6.5.0 and add your own JavaDelegate there.

2 Likes

I have uploaded test project:

It consists of:

  • docker-compose.yml and .env files for running Flowable rest and PostgreSQL inside Docker Compose;
  • sequence1Script.bpmn20.xml: test BPMN (it can be used with "flowable:async=“true” and without it). It should be deployed to Flowable after it start;
  • FlowableEmptyWithWaitTest.jmx: JMeter test file

Testing performance = (total number of ‘Completed’ messages in log / test duration in seconds).

I have received 127 completed processes per second with “flowable:async” and 804 without it.

1 Like

Btw, is there a metric in Flowable that counts the number of records in act_ru_job table?

I have forgotten to tag you.

I have implemented message-based Async Executor using RabbitMQ and get 191 completed processes per second instead of 72. Thats much better but still not as good as sync version.

Excuse me, do you have any plans to investigate this case using the provided example? It is important to our project. If there is no way to achieve high performance in this scenario, we will have to switch from Flowable to some other workflow engine.

Thank you once again!

Don’t forget this is an open source forum and it’s a bit of a crazy times right now. If you need immediate help, we do offer commercial services all over the globe :wink:

We will look into it. The numbers you are getting are not getting close to any of the benchmark numbers we achieved: https://blog.flowable.org/2018/03/13/async-history-performance-benchmark/. We’ll look into your zip and see what differences there are with our regular benchmark setups.

1 Like

Thank you for the answer! Of course I could not insist on help on open source forum and just asked to find out is there any hope to get a feedback :slight_smile:

Thank you again!

Ok, I looked into your example test and process and came to the following conclusions.

  • First of all, Flowable is configured out of the box to have minimal resources. We don’t like being this tool that gobbles RAM and CPU. This means that a typical Flowable installation needs to be tweaked according to needs.
  • Second, running with an async step is always going to be slower than running it synchronous on a single node (i.e. my laptop in this case). Async starts to pay off when spreading the load across multiple nodes (or when it makes sense to add transaction boundaries).

I created a simple Spring Boot app to verify this, as we’ve seen too many strange things happening when Docker is involved. So basically, created a new Spring Boot app, added your process xml to the ‘processes’ folder under resources and used these dependendies:

 <dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>


    <dependency>
        <groupId>org.flowable</groupId>
        <artifactId>flowable-spring-boot-starter-rest</artifactId>
        <version>6.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
    </dependency>
    <dependency>
        <groupId>org.codehaus.groovy</groupId>
        <artifactId>groovy-jsr223</artifactId>
        <version>2.5.10</version>
    </dependency>
</dependencies>

Basically, the starter for rest does the same as the flowable-rest app: it adds and exposes the Flowable REST api.

The baseline number I got when executing the sync process with your JMeter script on my laptop was ± 750 process instances/sec.

Now, in this synchronous use case, the threads actually executing the process instances are the Tomcat threads. By default, Spring Boot has a max of 250 threads. This means that the 50 threads from JMeter blasting away can be handled easily, whilst doing the Flowable logic at the same time.

The default async executor of Flowable has 8 threads, which is by no means an equal comparison with the tomcat threads. The JMeter script is blasting away with 50 threads concurrently, so there’s more being fed into the system than is being processed.

Furthermore, when I attached a profiler to the app, it was clear that the default connection pool wasn’t cutting it. This is logical, as instead of 1 connection / tomcat thread the async executor also i taking connections for each async job concurrently.

That all being said, the settings were changed to the following in the application.properties:

spring.datasource.url=jdbc:postgresql:testing
spring.datasource.username=flowable
spring.datasource.password=flowable
spring.datasource.hikari.maximum-pool-size=250

spring.task.execution.pool.core-size=64
spring.task.execution.pool.max-size=64

flowable.history-level=none

logging.file=output.txt

I tried with more threads, but the combined 50 JMeter threads + tomcat threads handling the POST + async executor threads + local postgres db was too much for my machine, hence the setting of 64. On a production installation this would off course be possible.

With these settings, I got ± 400 process instances/second. Which makes sense given the explanation above. Probably could be made faster by swapping the groovy script with a java class. In the benchmarks we’ve published (https://blog.flowable.org/2018/03/05/flowable-6-3-0-performance-benchmark/) we used multiple machines on AWS (with loadbalancers and a proper db).

Most likely, if we zoom in on the particular process here, there can be some settings or tweaks be found that would speed it up even more.

Hope this clarifies some things.

3 Likes

Thank you very much for such a detailed response! We will think about tuning the size of Flowable thread pools and using multiple nodes.

Just to add something extra to Joram’s input.

Using async service tasks won’t make your processes run faster. The entire idea of the async service tasks is not to make the entire execution of a Process faster, but rather make it more resilient, but providing transaction borders around Service Tasks. Isolating long running service tasks, communicating with other systems that can take a minute or 2 and thus reducing the time on the Tomcat threads.

Let’s simplify this. Instead of printing to the log, the service task is communicating with an external system and this takes 30 seconds. This means that a single HTTP request would take at least 30 seconds, since it is waiting for the external system before returning. This would mean that the load for the application would decrease since Tomcat will not have free threads to handle the parallel requests and the response time of an HTTP request would be low.

In the end it is entirely up to you how you would design your Process and what you want to provide out of it from the execution point of view. As Joram showed Flowable is quite flexible and offers you different possibilities.

I would suggest reading the following 2 blog posts:

2 Likes

Thank you! The idea is clear now.

have you tried wrapping the async method ?

no replyy…??