91-9990449935 0120-4256464 |
MapReduceTo take the advantage of parallel processing of Hadoop, the query must be in MapReduce form. The MapReduce is a paradigm which has two phases, the mapper phase and the reducer phase. In the Mapper the input is given in the form of key value pair. The output of the mapper is fed to the reducer as input. The reducer runs only after the mapper is over. The reducer too takes input in key value format and the output of reducer is final output. Steps in Map Reduce
How Many MapsThe size of data to be processed decides the number of maps required. For example, we have 1000 MB data and block size is 64 MB then we need 16 mappers. Sort and ShuffleThe sort and shuffle occur on the output of mapper and before the reducer.When the mapper task is complete, the results are sorted by key, partitioned if there are multiple reducers, and then written to disk.Using the input from each mapper <k2,v2> , we collect all the values for each unique key k2. This output from the shuffle phase in the form of <k2,list(v2)> is sent as input to reducer phase. MapReduce ExampleUse Case Find the number of occurrences of the word using Map Reduce in a text file Solution: Step 1: Upload the file on HDFS data.txt from /usr/Desktop(local path) to /Hadoop/data (Hadoop folder). $hadoop fs ?put /usr/Desktop/data.txt /Hadoop/data Step 2: Write the Map reduce program using eclipse and make the jar of it and name it count. File: wc_mapper.java File: wc_reducer.java File: wc_runner.java Step 3: Run the jar file $hadoop jar count.jar WordCount /Hadoop/data.txt/user/root/example_count The output is stored in example_countfolder.
Next TopicHBase Tutorial
|