Run spark on oozie with command line arguments
We have described how to use oozie to run a pyspark program. This post will use a simple example to show how to use oozie to run a spark program in scala.
You might be interested in: 1. develop a spark program using SBT. 2. Parse arguments for a spark program using Scopt.
Here are the key points of this post:
- A workable example to show how to use oozie spark action to run a spark program
- How to specify third party libraries in oozie
- How to specify command line arguments to the spark program in oozie
The following code shows the content of the workflow.xml file, in which we use the spark action to submit a spark program in scala.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<action name="start_scala_spark"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="/tmp/xx/spark_oozie_test_out"/> </prepare> <master>yarn-cluster</master> <mode>cluster</mode> <name>${spark_name}</name> <class>my.spark.Main</class> <jar>test_spark_2.10-1.0.jar</jar> <spark-opts> --queue default --conf spark.ui.view.acls=* --executor-memory 2G --num-executors 2 --executor-cores 2 --driver-memory 3g --jars scopt_2.10-3.5.0.jar </spark-opts> <arg>--input</arg> <arg>/tmp/xx/spark_oozie_test</arg> <arg>--output</arg> <arg>/tmp/xxx/spark_oozie_test_out</arg> </spark> <ok to="end"/> <error to="fail"/> </action> |
Here are some of the tips to do the configuration:
- We specify the Main class using the
<class>
tag. - We specify the jar file that contains the main class using the
<jar>
tag - The dependent jar files should be put into the
apps/lib
folder - We can specify third parity libraries using the
--jars
option in the <spark-opts> tag - To specify the command line arguments, we need to use the
<arg>
tag. - To specify the argument:
--input the_input_path
, we have to separate them into two<arg></arg>
elements, Otherwise, you will get the following error:1Error: Unknown option, and then Error: Missing option