Mongodb Map-Reduce

Mongodb Map-Reduce

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results.

In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition)

The map function emits key-value pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data. MongoDB then stores the results in a collection.

The output of the reduce function may pass through a finalize function to further condense or process the results of the aggregation.

Why we need Map-Reduce

Map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

  • Keeping the results of M/R in a separate collection and updating it from time to time. more
  • Complex queries on large sharded data sets
  • Queries that are so complex that we can’t use the aggregation framework.
  • Map-Reduce runs javascript methods , we have the full power of the language.

Systax

db.collection.mapReduce(
  function(){emit(key, value);}, // map function
  function(key, values) {return reductionFunction},
  {
    out : {[action]: collection},
    query: document.
      sort: document,
  limit: number,
  finalize: finalizeFunction
}
)

In the above syntax

map is a javascript function that maps a value with a key and emits a key-value pair

reduce is a javascript function that reduces or groups all the documents having the same key

out specifies the location of the map-reduce query result

action merge, replace, reduce

query specifies the optional selection criteria for selecting documents

sort specifies the optional sort criteria

limit specifies the optional maximum number of documents to be returned

finalize is a javascript function that modifies reduced result.

Example

In this example, we will try to find out the average sales per customer

[{
 cust_name: "Mr. John",
 ord_date: ISODate("2017-09-03T14:17:00.000Z"),
 status: 'A',
 price: 35
},
{
 cust_name: "Mr. Andrew",
 ord_date: ISODate("2017-09-10T14:17:00.000Z"),
 status: 'A',
 price: 45
},
{
 cust_name: "Mr. John",
 ord_date: ISODate("2017-09-12T14:17:00.000Z"),
 status: 'A',
 price: 27
}]

Firstly, Define the map function to process each input document:

var mapFunction = function() {
  emit(this.cust_name, {count: 1, price: this.price})
}

In the function, this refers to the document that the map-reduce operation is processing

Secondly, Define the corresponding reduce function with two arguments keyCustName and valuesPrices:

var reduceFunction = function(keyCustName, valuesPrices) {
  reducedVal = {count: 0, price: 0}
  for (var idx = 0; idx <valuesPrices.length; idx++) {
    reducedVal.count +=valuesPrices[idx].count;
    reducedVal.price +=valuesPrices[idx].price;
  }
  return reducedVal;
}

Thirdly, Define a finalize function with two arguments key and reducedVal.

var finalizeFunction = function(key, reducedVal) {
  reducedVal.avg = reducedVal.price / reducedVal.count;
  return reducedVal;
}

Finally, Perform the map-reduce operation on the customer_orders collection using the mapFunctionreduceFunction, and finalizeFunction functions.

db.customer_orders.mapReduce(
  mapFunction, 
  reduceFunction, 
  { 
    out: "customer_sales", 
    finalize: finalizeFunction 
  });

 

Now, we can get our final output collection as follows:

/* 1 */
{
  "_id" : "Mr. Andrew",
  "value" : {
  "count" : 1.0000000000000000,
    "price" : 45.0000000000000000,
    "avg" : 45.0000000000000000
}
}

/* 2 */
{
  "_id" : "Mr. John",
  "value" : {
  "count" : 2.0000000000000000,
    "price" : 62.0000000000000000,
    "avg" : 31.0000000000000000
}
}

At the moment , we will how Mongodb perform incremental Map-Reduce. For that we will insert following document in customer_orders collection.

{
  "_id" : ObjectId("59b8a94324194e5c63fae587"),
  "cust_name" : "Mr. Andrew",
  "ord_date" : ISODate("2017-09-14T14:17:00.000Z"),
  "status" : "A",
  "price" : 40
}

And we will change our Map-reduce command as follows:

db.customer_orders.mapReduce(
  mapFunction,
  reduceFunction,
  {
    out: {reduce: "customer_sales"},
    query: { ord_date: { $gt: ISODate('2017-09-13 00:00:00') } },
    finalize: finalizeFunction
  });

 

here Map-reduce only emit recent document based on query we have applied on it. Output the result to the collection customer_sales but reduce the contents with the results of the incremental map-reduce.

Out put collection as follows:

/* 1 */
{
  "_id" : "Mr. Andrew",
  "value" : {
  "count" : 2.0000000000000000,
    "price" : 85.0000000000000000,
    "avg" : 42.5000000000000000
}
}

/* 2 */
{
  "_id" : "Mr. John",
  "value" : {
  "count" : 2.0000000000000000,
    "price" : 62.0000000000000000,
    "avg" : 31.0000000000000000
}
}

 

If the data is constantly growing , the we can perform an incremental map-reduce rather than performing the map-reduce operation over the entire data set each time.

You May Also Like

About the Author: Md. Delwar Hossain

He has 11 years of experience in developing standalone software and web applications for multiple database platforms. He has been passionate about new tools and technologies. He is positive and trustworthy. He is capable to learn and adapt quickly to different situations. He is a great team player and enjoys leading and mentoring. He is specialized in architecting and building complex web and mobile application. He has strong skills to automate POS, inventory, supply chain, trading export/ import, human resource management, manufacturing and production, distribution management system and hospital management system.

Leave a Reply

Your email address will not be published. Required fields are marked *