Node.js: Array looping with async

To be honest, Node.js based asynchronous pattern makes me sick! It can be the worst idea to combine synchronous based coding pattern you’ve learned since first year in computer school. One of very special case of this sickening pattern is to use array based iteration (which is by default, synchronous) to process your data each by each asynchronously and do something else afterwards. Imagine you get code like this :

// data = [{_id: "1", childData: "1"}, {id: "2", childData: "2"}, {id: "3", childData: "2"});
// sum = [{_id: "1", sum: 500}, {_id: "2", sum: 600}]
db.data.find({}, function(err, datas) {
    var count = 0;
    for(idx in datas) {
        // Do MongoDB's relation query somewhere else
        db.childData.findOne({_id: datas[idx].childData}, function(err, childData) {
            count += childData.sum;
        });
    }

    // Show data output
    console.log(count)
});


That example code is based from MongoDB database implementation inside Node.js. If you think this code as a synchronous one, you would expect it to print “1700″ (500 + 600 + 600). But in case of Node.js, it won’t. It just print “0″ because printing (using console.log) is much faster than finding another data inside databse.

To cope with this, I usually use a library called async. It provides you many kind of function to do a simple control flow and repetition of a collection (or array). In the case of example above, you can use eachSeries or eachSeries. They take same argument: an array, a function called for every element, and function called after all element have been processed.

var async = require('async');
db.data.find({}, function(err, datas) {
    var count = 0;
    async.series(datas, function(data, callback) {
        // Do MongoDB's relation query somewhere else
        db.childData.findOne({_id: datas[idx].childData}, function(err, childData) {
            count += childData.sum;
            callback(err);       // If err != null, iteration will be stopped
        });
    }, function(err) {
       // Show data output
       console.log(count)
    });

    
});

So how does async work? Simple, they just wait callback() call for every element of iteration to do the next element. In case of eachSeries, they will process every element serially (e.g. each element is processed after the previous one) and if you want to do paralel iteration, uses each instead. Iteration can be halted in the middle if callback is called with a first argument (called err), so it won’t continue the iteration if something happens.

But remember to double-check so that callback() will be definitely called at some point because if that do not happen, your program will stuck indefinetely. I will you give an example of this scenario :

var async = require('async');
db.data.find({}, function(err, datas) {
    var count = 0;
    async.series(datas, function(data, callback) {
        // Do MongoDB's relation query somewhere else
        db.childData.findOne({_id: datas[idx].childData}, function(err, childData) {
            // Check if sum is defined so it can be added to count
            if(childData.sum) {
                count += childData.sum;
                callback(); // Bad position for a callback
            }
        });
    }, function(err) {
       // Show data output
       console.log(count)
    });
});

This program should be okay if all childData contains “sum” field. But when it doesn’t, then this code will stuck because no callback()} is being called in the middle of iteration. The simple rule is, callback must be called in array.length times, no less, no more.

1 Comments

Leave a Reply to saurav Cancel reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *