Concurrency defect in ScriptingEngines

org.flowable.common.engine.impl.scripting.ScriptingEngines is using a regular HashMap<> for its cachedEngines cache.

In a concurrent context I am seeing the same Flowable engine creating one Groovy engine per thread as a result (and luckily the Map didn’t get corrupted).

I’d recommend a ConcurrentHashMap, and a synchronized block to check & create missing entries.

BTW a related defect in Groovy this time, org.codehaus.groovy.jsr223.GroovyScriptEngineImpl#getScriptClass uses a ManagedConcurrentValueMap custom Map that is supposed to be thread safe, however I am seeing 0 cache hits and loader.parseClass being called once per process instance in a concurrent setting.

A concurrent hashmaps alone should not be relied on for caches: they won’t crash or corrupt the data structure sure, but w/o a synchronized block they cannot ensure cache hits.

See java - Cache using ConcurrentHashMap - Stack Overflow for example

I agree it would be better to use a thread-safe map there.

However, I’m not sure if it can get corrupted - in the end if multiple threads would put an instance in it, only one eventually would remain. This would only impact the first threads that concurrently execute a script.

But I will look into it (also want to check the ‘THREADING’ comment in that class) - just wanted to understand the crash/corruption part better.

According to the ScriptEngineFactory#getParameter there is a reserved key called “THREADING” whose value describes the behavior of the engine with respect to concurrent execution of scripts and maintenance of state is also defined. If the value is null then the engine implementation is not thread safe.

If we look in the Groovy implementation in

We can see that it returns “MULTITHREADED”, which means that there will be a thread that will set the value of an engine in the ScriptingEngines#cachedEngines. It is possible that some threads will get a non cached value and create a ScriptEngine. However, as @joram said eventually the cache will be filled and everything would go through the cache.

We could optimise this by using a synchronised block if there is no cached engine.

In any case, I do not see how this can lead to a defect. Can you please explain this a bit better? Are you sure that you have not disabled the caching of the engines in the ScriptEngines?

The simple unit test below shows that new engines are inserted concurrently in that hashmap, while at the same time another thread could be reading from the hashmap: this could lead to a ConcurrentModificationException.

public class ScriptingEnginesTest
{
    @Test
    public void testConcurrency()
    {
        StandaloneInMemProcessEngineConfiguration config = new
            StandaloneInMemProcessEngineConfiguration();
        config.buildProcessEngine();
        ScriptingEngines engines = config.getScriptingEngines();

        VariableScope scope = Mockito.mock(
            ExecutionEntityImpl.class,
            withSettings().defaultAnswer(invocation -> {
                throw new UnsupportedOperationException(invocation.toString());
            }));
        Random random = new Random(System.currentTimeMillis());
        ExecutorService executor = Executors.newFixedThreadPool(50);
        for (int i = 0; i < 50; i++) {
            executor.submit(() -> {
                try {
                    Thread.sleep(random.nextInt(50));
                    engines.evaluate("", "juel", scope);
                }
                catch (UnsupportedOperationException e) {
                    // Ignore, expected from the scope mock
                }
                catch (Throwable t) {
                    t.printStackTrace();
                }
            });
        }
    }
}

Sample output from debug logging showing concurrent gets/sets:

Getting juel
Getting juel
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@18e629fb
Getting juel
Getting juel
Getting juel
Getting juel
Getting juel
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@5bb433e9
Getting juel
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@48306db0
Getting juel
Getting juel
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@69b6ee8d
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@29051428
Storing juel -> org.flowable.engine.impl.scripting.JuelScriptEngine@6068c236

Groovy indeed supports multithreading so ScriptingEngines will try to cache and reuse the engine.

Not quite: all the threads that reach
scriptEngine = cachedEngines.get(language);
before the first thread completes
cachedEngines.put(language, scriptEngine);
will put a new engine in the cache.

Not only that, they will also proceed to update the cache; and since nother thread could be checking the cache at the same time you can get a concurrentModificationException.
[/quote[
However, as @joram said eventually the cache will be filled and everything would go through the cache.

This is not an optimization, this is about correctness…

Please see my answer to Joram. The get & put calls are fast compared to the code around them so I didn’t get the actual exception, but I guarantee you could.

Yes I am, I debugged through the code.

Franck

Just checked the Hashmap javadoc and the implementation is in fact pretty lenient in that it allows changing an existing value concurrently:

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)

So you would only get an exception if the two first threads hit the cache.put call at the same time: hard to reproduce, but clearly possible looking at the code.

Also, the problem is not just CPU overhead of creating multiple engines, it is also memory overhead: we detected this in a highly concurrent environment where each thread ended up having its own engine (since they all made their first getEngine call before the engine had time to be cached).

Franck

Hey @franck102,

Thanks for the detailed explanations. Your comments do make sense. The memory overhead might indeed be larger, but I believe that the GC will take care of the ScriptEngine(s) that are not used.

Anways, we did a fix by wrapping the creation of the ScriptEngine in a synchronized block in Make sure that only one script engine is created in a multi threaded … · flowable/flowable-engine@2816fca · GitHub. Let us know what you think about the solution.

Cheers,
Filip

Much better, although you will want to read this: The "Double-Checked Locking is Broken" Declaration

… these days I simply take the lock upfront, the performance hit is minimal in modern JVMs.

Franck

Thanks for the link, I did have a read, and it makes a compelling story. However, I am not sure whether the upfront lock is minimal as you are saying. To verify that we’ll need to run a benchmark to test validate it. This piece of the code will be hit by every single script evaluation and if you have many threads doing it and that could slow it down in theory.

Makes sense.
Another typical approach is to leverage the new computeIfAbsent method of ConcurrentHashMap - this is supposed to be somewhat optimized (at least for marge maps with many buckets).

Franck

I thought about that as well. However, we don’t always put the ScriptEngine in the cache. It is dependant on whether the engine supports multi threading or not.