MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That’ll change soon; we’re over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they’ve eagerly implemented in recent months, noticing that they won’t solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.
So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.
The simplest way I could find of doing this is:
db.loglines.group({ ‘cond’ : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});
Or, the version in pymongo:
> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
out[doc.country] = 0;};
out[doc.country] += 1;};
"""
> l.group(key = {},
condition = {},
initial = {'total':0},
reduce = reduce_func)
[{
u'AE': 215.0,
u'AG': 23.0,
u'AM': 140.0,
u'AN': 58.0,
u'AO': 56.0,
...
u'total' : 87901;
}]
[apologies for formatting; I’ve not really figured out how to edit js within a python repl]