Connection Pool Memory Leak

It’s happened a few times now, so I know it’s not just a coincidence. Where I’m not sure is what I’ve missed in the configuration/process engine setup that’s allowing this to happen. The problem I have is that I have hooked up the Flowable process engine to a HikariCP with a max connection pool size of 20. I’m caching the process engine into an application-level variable so that I don’t have to keep redefining it each time I want to use, for example, the TaskService API commands.

After a while, I’ll get an email from the DBA asking why Flowable has chewed up all the available database connections (today it was 131 connections). From reading, it looks like this is because nothing is releasing the connections (using HikarCP’s connection.close() logic to release the connection back to the wild).

I can fix this by NOT storing my process engine to an application-level variable. The problem with that solution, though, is that it’s computationally expensive to re-establish a new connection each time to flowable, but it’s pretty fast and easy to use a process engine that’s already connected and running. Is there a way to get the best of both worlds, to tell the engine to close a connection once it’s done being used?

I am not using Spring.

Can you elaborate what you mean with caching / application level variable? If Flowable gets a Hikari datasource with 20 connections, it cannot go beyond that (that’s the responsibility of the datasource).

It feels like there’s a configuration problem or you’ve got multiple engines running. Can you share how you’re setting up your engine?

It’s a technique in ColdFusion script and it loosely equates with a globally defined variable in Java (think old school java where you put variable declarations at the top of your Main.java file instead of within a class). ColdFusion script is a lot like writing Java code that runs from the command line and has no DI, so if you ripped out all the Sprint features and did all your class instantiation and linkage the old way instead of via dependency injection, it ends up looking the same algorithmically (just different syntax).

I say cached because application variable usage acts like a user-defined cahce, letting you store commonly used objects and data at this high-level where it’s accessible for the life-cycle of the web application. It also solved the problem that while using an established Flowable connection is fast (usually under 2 seconds for me), creating a new one is not (usually around 20 seconds to establish connection, depending on network latency). So I create the process engine configuration with the connection pool once at start up, then store it into the variable so that I can reference the connection as needed.

There are multiple engines running in our development environment, because each developer has their own application server instance and therefore would established a distinct connection pool with a different process engine instance.

Where the real fun comes into play, though, is that this happens in my TEST environment, where I’m not expecting multiple application instances and therefore wouldn’t anticipate seeing multiple connection spawns like this. Though your comment about the multiple engines has me thinking that maybe I should be looking for a fingerprint that this is happening. Is there an easier way to spot multiple engines running? Right now the only way(s) I can think of to detect such a fingerprint are:

1). I have an unexplained alteration to the process/task IDs, caused by the different engine instances reserving non-overlapping blocks of primary keys (to avoid collisions). By this I mean one process starts up with, say, ID 10001 and another starts up with 5002. That indicates I have 2 engines running, because the second engine reserved blocks 5001 to 9999.

2). I go hunting in netstat and server logs and look for threads/PIDs/ports that seem to all be coming from the same application server (there’s only one application server running on my TEST environment). I shudder at the thought of doing this because it seems like a huge time sink that is unlikely to yield answers.