How to package a Scala project to a Jar file with SBT

Tags: , ,

When you develop a Spark project using Scala language, you have to package your project into a jar file. This tutorial describes how to use SBT
to compile and run a Scala project, and package the project as a Jar file. This will be helpful for you to create a spark project and package it to a jar file.

The directory structure of a typical SBT project

Here is an example to show a typical SBT project, which has the following directory structures. 

.
|-- build.sbt
|-- lib
|-- project
|-- src
|   |-- main
|   |   |-- java (store main java files)
|   |   |-- resources (store include in main jar)
|   |   |-- scala (store main Scala source files)
|   |-- test
|       |-- java (store test java files)
|       |-- resources (store files include in test jar)
|       |-- scala (store test scala source files)
|-- target

You can use the following command to create this directory structures:

#!/bin/sh
cd ~/hello_world
mkdir -p src/{main,test}/{java,resources,scala}
mkdir lib project target

# create an initial build.sbt file
echo 'name := "MyProject"
version := "1.0"
scalaVersion := "2.10.0"' > build.sbt

The build.sbt file is SBT’s basic configuration file. You define most settings that SBT needs in this file, including specifying library dependencies, repositories, and any other basic settings your project requires. 

An example of the build.sbt for a Spark project

To start a spark project, you need to add the dependency jar files such as spark-core or spark mllib to the project. 

We can use the build.sbt to specify the dependencies easily. Here is an example to show the contents of my build.sbt.  It includes the necessary libraries such as spark core library, Spark Ml lib library and scalatest library for a typical Spark SBT project. 

name := "spark_proj"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.0"

libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.5.0" % "provided"

mainClass in (Compile, run) := Some("com.learn4master.Main")  

mainClass in (Compile, packageBin) := Some("com.learn4master.SparkMain") 

resolvers += "Akka Repository" at "http://repo.akka.io/releases/" 

libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.0-M15" 

libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.0-M15" % "test" 

libraryDependencies += "com.lihaoyi" % "ammonite-repl" % "0.5.6" % "test" cross CrossVersion.full initialCommands in (Test, console) := """ammonite.repl.Main.run("")"""

Create a simple scala file.

create a file named helloWorld.scala in the src/main/scala directory with these contents:

package my.package

object Main extends App {
    println("Hello, world")
}

Three commands will be used 

sbt compile

sbt run

sbt package

From the root directory of the project:

cd ~/hello_world

  you can compile the project:

$ sbt compile

Run the project:

$ sbt run

Package the project:

$ sbt package

The command:

sbt package

will produces the main artifact as a jar into target/scala-2.x.y/project_name_2.x.y-zz.jar.

sbt package‘ will produce a jar file.

If you want it to be executable you need to add the following to your .sbt config:

mainClass in Compile := Some("your.main.Class")

Continuously building the package:

sbt ~package

Standalone jar with all dependencies

If you want to build a standalone executable jar with dependencies, you may use the sbt-assembly plugin. And then build it by

sbt assembly

The standalone will be in target/project_name-assembly-x.y.jar.

You can run it by

java -jar project_name-assembly-x.y.jar [class.with.main.function]

Discussion

The first time you run SBT, it may take a while to download all the dependencies it needs, but after that first run, it will download new dependencies only as needed.

Because compile is a dependency of run, you don’t have to run compilebefore each run; just type sbt run.

 

The JAR file created with sbt package is a normal Java JAR file. You can list its contents with the usual jar tvf command:

$ jar tvf target/scala-2.10/basic_2.10-1.0.jar

You can also execute the main method in the JAR file with the Scala interpreter:

$ scala target/scala-2.10/basic_2.10-1.0.jar
Hello, world
SBT commands

As with any Java-based command, there can be a little startup lag time involved with running SBT commands, so when you’re using SBT quite a bit, it’s common to run these commands in interactive mode from the SBT shell prompt to improve the speed of the process:

$ sbt
> compile
> run
> package

4) Common SBT commands

At the time of this writing, there are 247 SBT commands available (which I just found out by hitting the Tab key at the SBT shell prompt, which triggered SBT’s tab completion). Table 18-1 shows a list of the most common commands.

Table 18-1. Descriptions of the most common SBT commands

Command Description
clean Removes all generated files from the target directory.
compile Compiles source code files that are in src/main/scala, src/main/java, and the root directory of the project.
~ compile Automatically recompiles source code files while you’re running SBT in interactive mode (i.e., while you’re at the SBT command prompt).
console Compiles the source code files in the project, puts them on the classpath, and starts the Scala interpreter (REPL).
doc Generates API documentation from your Scala source code using scaladoc.
help   Issued by itself, the help command lists the common commands that are currently available. When given a command, help provides a description of that command.
inspect Displays information about . For instance, inspect library-dependencies displays information about the libraryDependencies setting. (Variables in build.sbt are written in camelCase, but at the SBT prompt, you type them using this hyphen format instead of camelCase.)
package Creates a JAR file (or WAR file for web projects) containing the files in src/main/scala, src/main/java, and resources in src/main/resources.
package-doc Creates a JAR file containing API documentation generated from your Scala source code.
publish Publishes your project to a remote repository. See Recipe 18.15, “Publishing Your Library”.
publish-local Publishes your project to a local Ivy repository. See Recipe 18.15, “Publishing Your Library”. reload Reloads the build definition files (build.sbt, project/.scala, and project/.sbt), which is necessary if you change them while you’re in an interactive SBT session.
run Compiles your code, and runs the main class from your project, in the same JVM as SBT. If your project has multiple main methods (or objects that extend App), you’ll be prompted to select one to run.
test Compiles and runs all tests.
update Updates external dependencies.

There are many other SBT commands available, and when you use plug-ins, they can also make their own commands available. For instance, Recipe 18.7, “Using SBT with Eclipse” shows that the “sbteclipse” plug-in adds an eclipse command. See the SBT documentation for more information.

5) Continuous compiling

As mentioned, you can eliminate the SBT startup lag time by starting the SBT interpreter in “interactive mode.” To do this, type sbt at your operating system command line:

$ sbt
>

When you issue your commands from the SBT shell, they’ll run noticeably faster.

As shown in the Solution, you can issue the compile command from within the SBT shell, but you can also take this a step further and continuously compile your source code by using the ~ compilecommand instead. When you issue this command, SBT watches your source code files, and automatically recompiles them whenever it sees the code change.

To demonstrate this, start the SBT shell from the root directory of your project:

$ sbt

Then issue the ~ compile command:

> ~ compile

Now, any time you change and save a source code file, SBT automatically recompiles it. You’ll see these new lines of output when SBT recompiles the code:

From time to time when working in the SBT shell you may have a problem, such as with incremental compiling. When issues like this come up, you may be able to use the shell’s last command to see what happened.

For instance, you may issue a compile command, and then see something wrong in the output, 

To see what happened, issue the last compile command

> last compile
[debug]
[debug] Initial source changes:

[debug]  removed:Set()
[debug]  added: Set(.../Test.scala)
[debug]  modified: Set()
[debug] Removed products: Set()
[debug] Modified external sources: Set()
many more lines of debug output here ...

The last command prints logging information for the last command that was executed. This can help you understand what’s happening, including understanding why something is being recompiled over and over when using incremental compilation.

Typing help last in the SBT interpreter shows a few additional details, including a note about the last-grep command, which can be useful when you need to filter a large amount of output.

Reference:

  1. Programming In Scala
  2. Scala Cook Book