Basics of Map Reduce Algorithm Explained with a Simple Example
While processing large set of data, we should definitely address scalability and efficiency in the application code that is processing the large amount of data.
Map reduce algorithm (or flow) is highly effective in handling big data.
Let us take a simple example and use map reduce to solve a problem.
Say you are processing a large amount of data and trying to find out what percentage of your user base where talking about games.
We usually develop programs based on open sourced MapReduce frameworks such as Hadoop, Apache Pig, Apache Hive, and Spark to solve Big Data problems. In this post, I will use an example to describe what MapReduce is and how it works. I hope this will help you learn those Big Data technologies such as Hadoop, Pig, Hive and Spark easier.
What is MapReduce?
MapReduce is the key of Big Data. It was invented by Google, and it is the heart of Hadoop.
It is a programming paradigm that allows engineers or scientist build scalable systems that can run on hundreds or thousands of servers.