Deadlock update ACT_RU_EXECUTION

Hello
How to debug problem below. It occurs once/two a day. Flowable 6.8.0.It’s strange because Flowable should process one job per transaction so deadlock should’t happen at all. It’s hard to provide demo because it happens to rarely, but my process with script tak is exclusive

<serviceTask flowable:async="true" flowable:exclusive="true" flowable:expression="${bpmExprExecutor.executeScript(execution)}" id="ProcDefElement_150442" name="Name"/>

Error:

org.apache.ibatis.exceptions.PersistenceException: 
### Error updating database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Szczegóły: Process 265247 waits for ShareLock on transaction 2415990631; blocked by process 237318.
Process 237318 waits for ShareLock on transaction 2415985322; blocked by process 265247.
  Wskazówka: See server log for query details.
  Gdzie: while updating tuple (11357,37) in relation "act_ru_execution"
### The error may exist in org/flowable/db/mapping/entity/Execution.xml
### The error may involve org.flowable.engine.impl.persistence.entity.ExecutionEntityImpl.updateExecution-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION          SET REV_ = ?,                                                        ACT_ID_ = ?      where ID_ = ?       and REV_ = ?
### Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Szczegóły: Process 265247 waits for ShareLock on transaction 2415990631; blocked by process 237318.
Process 237318 waits for ShareLock on transaction 2415985322; blocked by process 265247.
  Wskazówka: See server log for query details.
  Gdzie: while updating tuple (11357,37) in relation "act_ru_execution"
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:196)
	at org.flowable.common.engine.impl.db.DbSqlSession.flushUpdates(DbSqlSession.java:580)
	at org.flowable.common.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:369)
	at org.flowable.common.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:212)
	at org.flowable.common.engine.impl.interceptor.CommandContext.close(CommandContext.java:70)
	at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:114)
	at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
	at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56)
	at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51)
	at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.executeJob(ExecuteAsyncRunnable.java:127)
	at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:115)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Szczegóły: Process 265247 waits for ShareLock on transaction 2415990631; blocked by process 237318.
Process 237318 waits for ShareLock on transaction 2415985322; blocked by process 265247.
  Wskazówka: See server log for query details.
  Gdzie: while updating tuple (11357,37) in relation "act_ru_execution"
	at org.postgresql@42.2.17//org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
	at org.postgresql@42.2.17//org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
	at org.postgresql@42.2.17//org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
	at org.postgresql@42.2.17//org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
	at org.postgresql@42.2.17//org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
	at org.postgresql@42.2.17//org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
	at org.postgresql@42.2.17//org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
	at org.jboss.ironjacamar.jdbcadapters@1.4.16.Final//org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:442)
	at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:47)
	at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:74)
	at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:50)
	at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
	at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:194)
	... 13 more

Hey @MirekSz,

That is indeed really strange.

How does your application setup look like? I see

What kind of a transaction manager are you using?

Cheers,
Filip

Something else that I forgot to ask. What are you doing in the bpmExprExecutor.executeScript(execution)

We are using Wildfly. Flowable has separate datasource for its own, no sharing etc. Also transaction scope is local for Flowable.(jta=“false”)

Env is a cluster with 10 servers. Each server has 8 threads for Flowable executor

                <datasource jta="false" jndi-name="java:jboss/datasources/flowable" pool-name="flowable" enabled="true" use-java-context="true" spy="false" use-ccm="false" tracking="true" statistics-enabled="true">
                    <connection-url>jdbc:postgresql://host:5432/bpm</connection-url>
                    <connection-property name="char.encoding">
                        UTF-8
                    </connection-property>
                    <connection-property name="readOnly">
                        false
                    </connection-property>
                    <connection-property name="logUnclosedConnections">
                        true
                    </connection-property>
                    <driver>postgresql</driver>
                    <new-connection-sql>set session statement_timeout to 600000</new-connection-sql>
                    <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
                    <pool>
                        <min-pool-size>0</min-pool-size>
                        <max-pool-size>20</max-pool-size>
                    </pool>
                    <security>
                        <user-name>u</user-name>
                        <password>p</password>
                    </security>
                    <validation>
                        <check-valid-connection-sql>SELECT current_timestamp</check-valid-connection-sql>
                        <validate-on-match>true</validate-on-match>
                        <background-validation>true</background-validation>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.postgres.PostgreSQLExceptionSorter"/>
                    </validation>
                    <timeout>
                        <idle-timeout-minutes>2</idle-timeout-minutes>
                    </timeout>
                    <statement>
                        <track-statements>nowarn</track-statements>
                        <prepared-statement-cache-size>0</prepared-statement-cache-size>
                        <share-prepared-statements>false</share-prepared-statements>
                    </statement>
                </datasource>

bpmExprExecutor.executeScript(execution) is responsible for execution of our business code like

businessCodeExecutor.executeInTransaction({ //on separate datasource
   saleDocumentService.confirm(id) 
})

Could other thread take the job after the lock expiration?

Hmmm no, because we have made hack on Flowable. On each node we have thread which increase LockTime to tell Flowable to not reset lock Owner/pass to another executor. This allow us to executo long time business logic. In this case operation time is about 2 seconds.

Real question is: If there is a chance that Flowable could process two jobs/execution in a single transaction which could leads to deadlocks

private static void startUpdateThread(final DataSource dataSource) {
	Thread thread = new Thread(() -> {
		while (true) {
			try {
				Thread.sleep(cfg.getAsyncExecutor().getAsyncJobLockTimeInMillis() / 2);
				try (Connection connection = dataSource.getConnection()) {
					try (PreparedStatement prepareStatement = connection.prepareStatement(
							"update act_ru_job set lock_exp_time_=? where lock_owner_=? and lock_exp_time_ is not null")) {
						LocalDateTime moveTime =
								LocalDateTime.now().plus(cfg.getAsyncExecutor().getAsyncJobLockTimeInMillis(), ChronoUnit.MILLIS);
						prepareStatement.setTimestamp(1, Timestamp.valueOf(moveTime));
						prepareStatement.setString(2, cfg.getAsyncExecutor().getLockOwner());
						prepareStatement.execute();
					}
				}
			} catch (Exception e) {
				LOGGER.error(e.getMessage(), e);
			}
		}
	});
	thread.setName("FlowableUpdateExpTimeMover");
	thread.start();
}

We have not heard of such problems by any customer or by any other user. Looks like you are heavily customizing the job executor and you are using different datasources / transactions.

Are you sure that your Wildfly configuration and / or your custom business code is not causing this? Looking at the statement where it fails

update ACT_RU_EXECUTION          
SET REV_ = ?,                                                        ACT_ID_ = ?
where ID_ = ?       and REV_ = ?

It looks to me like a variable was added / updated to the execution.

Based on StackTrace this exceptions occurs in here/Flowable code

The error may involve org.flowable.engine.impl.persistence.entity.ExecutionEntityImpl.updateExecution-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION          SET REV_ = ?,                                                        ACT_ID_ = ?      where ID_ = ?       and REV_ = ?

Our code doesn’t contains any update on act_ru_execution. But let’s imagine that it has..in my opinion it doesn’t matter. To create deadlock we need two transaction updating/deleting the same rows in oposite order. Even if we would have following code, then we should not create deadlock which Flowable code which is “one element per trsnsaction”

If we had bad code which createas deadlock in stacktrace we should see our packages

Any details about processes 237318 and 265247 and transactions 2415990631, 2415985322 ?

Does your code touch Flowable APIs? The update statement you are seeing happens because of some changes in the process execution. e.g. setting a variable, cancel boundary event etc. Even if your code triggered this change, you won’t see it in the stacktrace, because we do not communicate with the database on every call. We have an internal caching mechanism which will do the DB updates at the end of the transaction.

This should actually lead to optimistic locking exception. Here, it looks like you have some shared transactions. Best would be good to provide more details about the processes that @martin.grofcik asked about.

If you are doing Flowable changes in your code. Then I can see how there can be some deadlocks since you shared your code that does something like

businessCodeExecutor.executeInTransaction({ //on separate datasource
   saleDocumentService.confirm(id) 
})

Unfortunately this is random and short. I will try to find a way to add some code which catch this exception, parse it and send some SQLs to db to get details about transactions/processed involved in problem

Thanks