Flowable behaviour when database is down?

Hi,

What will happen to the flowable state when the database is down?

Eg., We have a single ServiceTask in our bpmn which is async=true and triggerable=true.

<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"
             xmlns:flowable="http://flowable.org/bpmn"
             typeLanguage="http://www.w3.org/2001/XMLSchema"
             expressionLanguage="http://www.w3.org/1999/XPath"
             targetNamespace="http://www.flowable.org/processdef">

    <process id="iftee" name="iftee2" isExecutable="true">
        <startEvent id="start"/>

        <sequenceFlow sourceRef="start" targetRef="torsk0"/>

        <serviceTask id="torsk0" name="torsker 0" flowable:async="true" flowable:triggerable="true" flowable:class="ws.flowable.Class1">
        </serviceTask>

        <sequenceFlow sourceRef="torsk0" targetRef="end"/>
              
        <endEvent id="end"/>
    </process>
</definitions>

Steps;
I’ve triggered a processInstance, after that,

  • Right before the execution of my Service task (JavaDelegate.execute()), our database was up and working fine.
  • During the execution of JavaDelegate.execute(), our database went down and Flowable internally threw exception that it couldn’t persist the state.

When db is down, our service logs is bombarded by the following stacktrace every few seconds,

Error getting a new connection.  Cause: java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
! Cause: java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
! at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
! at org.apache.ibatis.session.defaults.DefaultSqlSession.getConnection(DefaultSqlSession.java:300)
! at org.flowable.common.engine.impl.db.DbSqlSessionFactory.openSession(DbSqlSessionFactory.java:96)
! at org.flowable.common.engine.impl.interceptor.CommandContext.getSession(CommandContext.java:241)
! at org.flowable.common.engine.impl.cfg.standalone.StandaloneMybatisTransactionContext.<init>(StandaloneMybatisTransactionContext.java:48)
! at org.flowable.common.engine.impl.cfg.standalone.StandaloneMybatisTransactionContextFactory.openTransactionContext(StandaloneMybatisTransactionContextFactory.java:26)
! at org.flowable.common.engine.impl.interceptor.TransactionContextInterceptor.execute(TransactionContextInterceptor.java:47)
! at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:71)
! at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
! at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56)
! at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51)
! at org.flowable.job.service.impl.asyncexecutor.AcquireTimerJobsRunnable.run(AcquireTimerJobsRunnable.java:58)
! at java.lang.Thread.run(Thread.java:748)

After some 3-4 hours we brought the database up and db is running fine

Because in flowable internal table, ACT_RU_ACTINST, I can see the following 2 entires for the processInstance, (There is no entry for the serviceTask torsk0 because it couldn’t write as our db service was down back then)

  • Now, after the database is up, What state will the processInstance be in? and How will the ServiceTask be triggered again here?
  • Will flowable automatically trigger this after sometime, In that case how many times and at what interval? (Or) If manual intervention required, how?

Hi @mnithinkk,

In general, the expected behavior of Flowable would be as follows:
When an activity is started the corresponding entry is made in the ACT_RU_JOB table. This entry is locked, so, that there is a lock owner and lock expiration time properties specified. By default, lock expiration time is set to 5 min after the job has been created. If the execution is successful, the entry will be removed. However, in your case, the database was down, so, the corresponding entry could not be updated accordingly. As a result, it stays in the database and the expectation is that at some point, when the database is back again, ResetExpiredJobsRunnable of Flowable will come into play, scan through the existing jobs, remove the locks where expiration time is in the past, and, as a result, make these jobs available for execution. After that, AcquireAsyncJobsDueRunnable will come into play and execute them. So, the activity is supposed to be addressed again after the db is back.

The stacktraces you see when the database is down refer to the AcquireTimerJobsRunnable which is responsible for scanning the database for existing timer jobs on a regular basis. AcquireAsyncJobsDueRunnable and ResetExpiredJobsRunnable are also executed regularly and might result in a similar message.

Best,
Vasil

1 Like

Cool, Good Resiliency. Thanks!