In this post, I briefly introduce Spark, and uses examples to show how to use the popular RDD method to analyze your data. You can refer to this post to setup the pySpark environment using Ipython Notebook.
SparkContext, or Spark context is the entry point to develop a spark application using the spark infrastructure.
Once a SparkContext object is created, it sets up the internal services and build a connection to the cluster managers, which manage the actual executors that conduct the specific computations.
The following diagram from the Spark documentation visualize the spark architecture:
The SparkContext object is usually referenced as the variable sc,[Read More...]