Update on ACT_RU_EXECUTION

nakamura · November 26, 2019, 1:59pm

Hi,
I’m using flowable with an Oracle database and I´m having an issue on the following query:

update ACT_RU_EXECUTION set LOCK_TIME_ = :1 where ID_ = :2 and (LOCK_TIME_ is null OR LOCK_TIME_ < :3 )

It says:

The SQL statement with SQL_ID “ab46ty50gvtw4” was blocked on row locks.

The SQL_ID shows the previous update query.

In my research, I found that this error occurs when two or more sessions want simultaneous access to a resource.
Are there some changes in the schema for Oracle databases?
What situation could be triggering this error?

joram · December 4, 2019, 5:47pm

This particular update happens when a a job gets executed and it needs to have a lock on the process instance due to being an exclusive job.

We haven’t heard of this problem yet on Oracle. Which Oracle version are you using? Which isolation level?

One option would be to add/remove indexes on the LOCK_TIME … but want to understand first why this would be a problem for a database.

alvsanto · December 18, 2019, 11:21am

Nakamura and I work together and I will answer the questions.

The version of Database is 12.1.0.2.0

On the isolation level, the base in question is shared by Flowable’s unique schema and other schemas for other applications.

joram · December 19, 2019, 10:34pm

Thanks. We QA against various versions of Oracle, but haven’t seen this problem yet.

That’s normal behavior, however I would expect one of the two session doesn’t get access and doesn’t update.

workiva · June 19, 2020, 2:45pm

Hello,

We have started seeing this in an AWS Aurora cluster. It surfaces as a connection timeout in the mariadb driver.

### Error updating database.  Cause: java.sql.SQLException: (conn=1882029) Communications link failure with primary host redacted.us-east-1.rds.amazonaws.com:3306. Connection timed out
### The error may exist in org/flowable/db/mapping/entity/Execution.xml
### The error may involve org.flowable.engine.impl.persistence.entity.ExecutionEntityImpl.updateProcessInstanceLockTime-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_EXECUTION     set       LOCK_TIME_ = ?     where ID_ = ?       and (LOCK_TIME_ is null OR LOCK_TIME_ < ?)
### Cause: java.sql.SQLException: (conn=1882029) Communications link failure with primary host redacted.us-east-1.rds.amazonaws.com:3306. Connection timed out

I suspect a SIGTERM at the wrong moment started the problem. It is ongoing, and not obvious whether it is one bad job record, or affects everything. The system as a whole appears to be functional.

I will try an index on LOCK_TIME_ as suggested in a previous post and report back.

workiva · June 25, 2020, 2:12pm

Indexes added on (LOCK_TIME_) and (ID_, LOCK_TIME_) did not resolve the issue. The only thing that seems to resolve it for a short period is rebooting the Aurora master/writer.

EDIT: it is worth noting this exception happens 100s of times per second

joram · June 28, 2020, 9:51am

Do you know when this happens? All the time? Only with certain definitions?

The update query is the one used to lock a process instance. How much concurrent async jobs do you typically have at a given time?

workiva · June 29, 2020, 5:21pm

We have not isolated a root cause yet. It does not happen all the time, once the bad ACT_RU_JOB is deleted, things clear up. It has happened about 3 or 4 times total over the last 2 weeks. At current load, we have about 10 or so async jobs per second. It’s not too much. In certain respects it looks like an application level deadlock with Flowable’s custom locking code. The REV_ field was slowly increasing, but the number of retries per REV_ increment was in the 10s of thousands.

joram · July 6, 2020, 11:40am

Indeed, that load should be no problem.

I don’t think so - the thing what Flowable does is trying to acquire a lock on the process instance by doing the update from your first post. If that update returns 0, it means the lock was taken by someone else. The only way that can deadlock is when multiple are trying to lock (that on itself is no problem), and current transactions are doing a transaction on that process.

Do you happen to know if the access to the failing process instance is limited to the async jobs only? Or are there other interactions (e.g. task completes, etc.) happening concurrently on those instances?

workiva · July 7, 2020, 1:04pm

Hello Joram,

The forum has limited me to 3 responses, so I have been adding additional information to a github issue. https://github.com/flowable/flowable-engine/issues/2442.

Topic		Replies	Views
Deadlock ActivityInstanceEntityImpl.updateActivityInstance-Inline Flowable Engine	4	993	October 1, 2020
Flowable 6.8.0 deadlock Flowable Engine	0	287	December 28, 2023
Flowable Table Locking issue Flowable Engine	8	1451	November 29, 2022
ORA-00060 deadlock flowable with oracleDB as datasource Flowable Engine	8	2349	January 17, 2020
Database problem with change lock Flowable Engine	3	1451	September 10, 2019

Update on ACT_RU_EXECUTION

Related topics