Oozie | Learn for Master
  • Run spark on oozie with command line arguments

    We have described how to use oozie to run a pyspark program.  This post will use a simple example to show how to use oozie to run a spark program in scala. 

    You might be interested in: 1. develop a spark program using SBT.  2. Parse arguments for a spark program using Scopt.  

    Here are the key points of this post:

    1. A workable example to show how to use oozie spark action to run a spark program
    2. How to specify third party libraries in oozie
    3. How to specify command line arguments to the spark program in oozie

    The following code shows the content of the workflow.xml file,

    [Read More...]
  • run pyspark on oozie

     In this post, I first give a workable example to run pySpark on oozie. Then I show how to run pyspark on oozie using your own python installation (e.g., anaconda). In this way, you can use numpy, pandas, other python libraries in your pyspark program. 

    The syntax of creating a spark action on oozie workflow

    As described in the document, here are the meanings of these elements.

    The prepare element, if present, indicates a list of paths to delete or create before starting the job. Specified paths must start with hdfs://HOST:PORT .

    [Read More...]