How to setup ipython notebook server to run spark in local or yarn model
Ipython notebook is a powerful tool to learn python programming. In this post, I demonstrate how to setup a ipython notebook to to spark program in python.
- Install spark
suppose spark is install at directory ~/spark, then execute:1export SPARK_HOME="~/spark" - Install anaconda at ~/anaconda
123cd ~/anacondazip -r anaconda.zip .
This will compress all the anaconda files to a zip file
Run ipython notebook for pyspark using local model
- Now you can start a ipython notebook server in local model:
1234567IPYTHON_OPTS=“notebook –notebook-dir=${WORKSPACE_DIR} –ip=* \–config=${CONFIG_FILE} \–port=${PORT}” pyspark –master local[2] \–driver-memory 3G \–jars /..../hcatalog-support.jar \–conf spark.authenticate.secret=password \
WORKSPACE_DIR
is the space where you want to save your codes.
CONFIG_FILE
is the location of the jupyter_notebook_config file.
Run ipython notebook for pyspark using yarn modelYou can use the following code to start ipython notebook from yarn model.
12345678910111213IPYTHON_OPTS=“notebook –notebook-dir=${WORKSPACE_DIR} –ip=* \–config=${CONFIG_FILE} \–port=${PORT}” pyspark \–verbose \–master yarn \–deploy-mode client \–queue default \–num-executors 3 \–driver-memory 3g \–executor-memory 3g \–executor-cores 2 \–jars /xxxxx/hcatalog-support.jar \–archives hdfs://xxxxxx:8020/user/xxxx/anaconda.zip#anaconda_remotearchives
, you need to upload the anaconda.zip file to hdfs folder, then specify its location here.