Adding Multiple Columns to Spark DataFrames
from: https://p058.github.io/spark/2017/01/08/spark-dataframes.html
I have been using spark’s dataframe API for quite sometime and often I would want to add many columns to a dataframe(for ex : Creating more features from existing features for a machine learning model) and find it hard to write many withColumn statements. So I monkey patched spark dataframe to make it easy to add multiple columns to spark dataframe.
First lets create a udf_wrapper decorator to keep the code concise
1 2 3 4 5 6 7 8 9 10 |
from pyspark.sql.functions import udf def udf_wrapper(returntype): def udf_func(func): return udf(func, returnType=returntype) return udf_func |
Lets create a spark dataframe with columns, user_id, app_usage (app and number of sessions of each app),
[Read More...]