.
  • A Spark program using Scopt to Parse Arguments

    To develop a Spark program, we often need to read arguments from the command line. Scopt is a popular and easy-to-use argument parser. In this post, I provide a workable example to show how to use the scopt parser to read arguments for a spark program in scala. Then I describe how to run the spark job in yarn-cluster mode.

    The main contents of this post include:

    1. Use scopt option parser to parse arguments for a scala program.
    2. Use sbt to package the scala program
    3. Run spark on yarn-cluster mode with third party libraries

    Use Scopt to parse arguments in a scala program

    In the following program,

    [Read More...]
  • How to package a Scala project to a Jar file with SBT

    When you develop a Spark project using Scala language, you have to package your project into a jar file. This tutorial describes how to use SBT
    to compile and run a Scala project, and package the project as a Jar file. This will be helpful for you to create a spark project and package it to a jar file.

    The directory structure of a typical SBT project

    Here is an example to show a typical SBT project, which has the following directory structures. 

    .
    |-- build.sbt
    |-- lib
    |-- project
    |-- src
    |   |-- main
    |   |   |-- java (store main java files)
    |   |   |-- resources (store include in main jar)
    |   |   |-- scala (store main Scala source files)
    |   |-- test
    |       |-- java (store test java files)
    |       |-- resources (store files include in test jar)
    |       |-- scala (store test scala source files)
    |-- target

    You can use the following command to create this directory structures:

    #!/bin/sh
    cd ~/hello_world
    mkdir -p src/{main,test}/{java,resources,scala}
    mkdir lib project target
    
    # create an initial build.sbt file
    echo 'name := "MyProject"
    version := "1.0"
    scalaVersion := "2.10.0"' >
    [Read More...]