In the previous post, we have already introduce Spark, RDD, and how to use RDD to do basic data analysis. In this post, I will show more examples on how to use the RDD method.
Spark RDD reduceByKey Method
We have used reduceByKey to solve the word frequency calculation problem. Here I will use a more complicated example to show how to use reduceByKey.
Suppose we have a set of tweets, each was shared by different users. We also give each user a weight denoting his importance.
Here is an example of the data set.
[Read More...]