Run spark on oozie with command line arguments

We have described how to use oozie to run a pyspark program.  This post will use a simple example to show how to use oozie to run a spark program in scala. 

You might be interested in: 1. develop a spark program using SBT.  2. Parse arguments for a spark program using Scopt.  

Here are the key points of this post:

  1. A workable example to show how to use oozie spark action to run a spark program
  2. How to specify third party libraries in oozie
  3. How to specify command line arguments to the spark program in oozie

The following code shows the content of the workflow.xml file, in which we use  the spark action to submit a spark program in scala.

Here are some of the tips to do the configuration:

  1. We specify the Main class using the <class> tag. 
  2. We specify the jar file that contains the main class using the <jar> tag
  3. The dependent jar files should be put into the apps/lib folder
  4. We can specify third parity libraries using  the --jars option in the <spark-opts> tag
  5. To specify the command line arguments, we need to use the <arg> tag. 
  6. To specify the argument: --input the_input_path, we have to separate them into two <arg></arg> elements, Otherwise, you will get the following error: