spark submit multiple jars

Tags:

It is straight to include only one dependency jar file when submit Spark jobs. See the following example:

How about including multiple jars? See I want to include all the jars like this: ./lib/*.jar. 

According to spark-submit‘s --help, the --jars option expects a comma-separated list of local jars to include on the driver and executor classpaths.

However,  ./lib/*.jar is expanding into a space-separated list of jars. 

According to this answer on StackOverflow, we have different ways to generate a list of jars that are separated by comma. 

A simple solution 

Simple but not perfect solution:

There’s a slight flaw in that this will not handle file names with spaces correctly. If that matters try this slightly more complicated version:

A better solution

A better way is to change the field separator variable $IFS. This is very strange-looking but will behave well with all file names and uses only shell built-ins.

Explanation:

  1. files is set to an array of file names.
  2. IFS is changed to :.
  3. The array is echoed, and $IFS is used as the separator between array entries. Meaning the file names are printed with colons between them.

All of this is done in a sub-shell so the change to $IFS isn’t permanent (which would be baaaad).

Here are two methods that include multiple jars when submit spark jobs:

 

http://stackoverflow.com/questions/24855368/spark-throws-classnotfoundexception-when-using-jars-option