Count word frequency

Tags: , , , ,

Count word frequency is a popular task for text analysis. In this post, I describe how to count word frequency using Java HashMap, python dictionary, and Spark. 

Use Java HashMap to Count Word frequency

{a=5, b=2, c=6, d=3}

Use Python Dict to count word frequency

The output:

{‘a’: 5, ‘c’: 6, ‘b’: 2, ‘d’: 3}

Use Spark to count word Frequency

The above method works well for small dataset. However, if you have a huge dataset, the hashTable based method will not work. You will need to develop a distributed program to accomplish this task. It turns out this can be easily done by using Spark or MapReduce. 

Please refer to this post for using Spark to count word frequency.