Issue with additional JARs for SparkML + Elasticsearch App

I am trying to run the flowable-examples/demo-machine-learning spark app. The issue I currently have is with org/elasticsearch/spark/rdd/api/java/JavaEsSpark in the AnalysisDecisions Spark app at line 119.

JavaRDD<LabeledPoint> data = JavaEsSpark.esRDD(javaSparkContext, "flowable/variables")
                .values()

What should I pass for the “appResource

 SparkAppHandle sparkAppHandle = new SparkLauncher()
                    .setSparkHome(System.getProperty("SPARK_HOME"))
                    .setAppResource(System.getProperty("appResource")) // here
                    .setMainClass("org.flowable.AnalyseDecisions")
                    .setMaster("local[4]")

Any help from @joram would be great. Thanks, team.

Exception in thread "main" java.lang.NoClassDefFoundError: org/elasticsearch/spark/rdd/api/java/JavaEsSpark

at org.flowable.AnalyseDecisions.inferDecisionTree(AnalyseDecisions.java:119)

at org.flowable.AnalyseDecisions.main(AnalyseDecisions.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.spark.rdd.api.java.JavaEsSpark
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)

Phew, it’s been a while since we did that demo so it’s all a bit fuzzy.

Looking at the repo, I think the appResource is a reference (most likely the path) to the jar running the Spark job. If I’m not mistaken, the jar should be the result of building this module: https://github.com/flowable/flowable-examples/tree/master/demo/demo-introducing-machine-learning/demo-spark (it’s a fat jar, which means it’ll include the elasticsearch classes)

1 Like

Thanks a lot, @joram. It did fix the issue :blush: