.
  • Scala read file examples

    In this post, I show some of the best practices to read file in Scala. We can use the mkString method to read all the contents of a file into a variable. We can also use the getLines methods to iterator through the contents of a file. 

    Read all the data into a String

    Scala loop a file

    Sometimes we don’t want to load all the contents of a file into the memory, especially if the file is too large. What we want is to loop the file, and process one line each time. See the following example,

    [Read More...]
  • Scala for loop

    In this post, we list the common ways of for loop in Scala programming. 

    Scala for loop with ranges

    The simplest syntax of for scala loop with ranges is as follows:

     i to j can also be replaced with other generators such as  i until j.  See the following example:

    The output is:

    value of a: 1
    value of a: 2
    value of a: 3
    value of a: 4
    value of a: 5

    Scala for loop with multiple ranges

    We can use multiple ranges separated by semicolon (;) within scala for loop.

    [Read More...]
  • A Spark program using Scopt to Parse Arguments

    To develop a Spark program, we often need to read arguments from the command line. Scopt is a popular and easy-to-use argument parser. In this post, I provide a workable example to show how to use the scopt parser to read arguments for a spark program in scala. Then I describe how to run the spark job in yarn-cluster mode.

    The main contents of this post include:

    1. Use scopt option parser to parse arguments for a scala program.
    2. Use sbt to package the scala program
    3. Run spark on yarn-cluster mode with third party libraries

    Use Scopt to parse arguments in a scala program

    In the following program,

    [Read More...]
  • Parse libsvm data for spark MLlib

    LibSVM data format is widely used in Machine Learning. Spark MLlib is a powerful tool to train large scale machine learning models.  If your data is well formatted in LibSVM, it is straightforward to use the loadLibSVMFile  method to transfer your data into an Rdd.  

    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

    However, in certain cases, your data is not well formatted in LibSVM.  For example, you may have different models, and each model has its own labeled data. Suppose your data is stored into HDFS, and each line looks like this: (model_key, training_instance_in_livsvm_format).

    In this case, 

    [Read More...]
  • Scala vs Java examples

    This tutorial gives a quick introduction to the Scala language by comparing Scala with Java using examples. It is for people who have learned Java and want to learn Scala. 

    Scala vs Java: The Hello Word Example
    A hello word example for Scala:

    A hello world example for Java:

    object Keyword in Scala

    In scala, the object keyword is used to create a singleton object. The declaration above declares both a class called HelloWorld and an instance of that class, also called HelloWorld.  

    One difference between Scala and Java is that,

    [Read More...]
  • How to package a Scala project to a Jar file with SBT

    When you develop a Spark project using Scala language, you have to package your project into a jar file. This tutorial describes how to use SBT
    to compile and run a Scala project, and package the project as a Jar file. This will be helpful for you to create a spark project and package it to a jar file.

    The directory structure of a typical SBT project

    Here is an example to show a typical SBT project, which has the following directory structures. 

    .
    |-- build.sbt
    |-- lib
    |-- project
    |-- src
    |   |-- main
    |   |   |-- java (store main java files)
    |   |   |-- resources (store include in main jar)
    |   |   |-- scala (store main Scala source files)
    |   |-- test
    |       |-- java (store test java files)
    |       |-- resources (store files include in test jar)
    |       |-- scala (store test scala source files)
    |-- target

    You can use the following command to create this directory structures:

    #!/bin/sh
    cd ~/hello_world
    mkdir -p src/{main,test}/{java,resources,scala}
    mkdir lib project target
    
    # create an initial build.sbt file
    echo 'name := "MyProject"
    version := "1.0"
    scalaVersion := "2.10.0"' >
    [Read More...]