Open Data

We’re in the midst of a data explosion. Then again, we’re

always

in the midst of a daa explosion. It’s been developing, wave by wave, since the first Sumerian scribe pushed his wedge into clay. Maybe it feels different this time; maybe it’s always felt different.

The past two centuries saw the gradual triumph of ordered data collection: the regimented and expensive process of the census, the time-motion study, the economic indicator. The province of powerful behemoths — government, military, corporate or the omnipresent RAND corporation — such projects were rigorously plannedat the top, then executed by a small army of functionaries.

In the last 15 years, something has changed. Quantitative change, initially: more data, faster computers, easier transmission of information. But also a change in quality. Now we’ve moved into the era of data as by-product. Our clicks and our purchases are tracked because watching us is cheap and easy, not as part of a pre-planned technocratic project. Such cheapness brings us into the age of data abundance, and we’re only beginning to appreciate the consequences and the possibilities.

Enter the Open Data movement. Bubbling with geekish idealism, this is a loose grouping of campaigners trying to prize large datasets out of government and corporate hands, bringing them into the agora. Knowledge here may be measured in SQL dumps, linked data and gigabytes of official transcripts, but the idealism fits into the standard pattern: the Truth will set you free.

Freebase

getting country population data from freebase:



from freebase.api import HTTPMetawebSession, MetawebError

mss = HTTPMetawebSession('www.freebase.com')

list(mss.mqlread([{'name': None, 'type': '/location/country', '/location/country/iso3166

1

alpha2' : None, '/location/statistical_region/population' : [{'number': None}] }]))

Kyrgyzstan: NYT beats WaPo

I’ve been shamefully ignoring Kyrgyzstan. Well, I’ve been ignoring politics in general, busy on vodo and other techncial work. But it’s particularly hideous to start ignoring an area you care about, just as a great many things (good and bad) start happening. Not reading about what’s happening over there makes me feel complicit in the near-total lack of attention in the Western media.

And Kyrgyzstan really is being overlooked to an incredible extent. Washington Post: nothing worth mentioning. New York Times is doing noticeably better, though — in fact, their coverage is pretty decent considering the distance, and the lack of much domestic political significance within the US.

Kyrgyzstan: new interior minister

Kyrgyzstan has a new interior minister. Probably no bad thing, given that the accomplishments of previous acting interior minister Bolot Sher consisted of:

* pursuing Bakiyev’s relatives

* Making the supremely reassuring statement that “I am in command of 80 percent of the Ministry of Interior…The other 20 percent is still waffling.”

On the other hand his replacement, Kubat Baibolov, is coming straight from an oh-so-successful stint running things in Jalal-Abad

Kyrgyz political biographies

Ran into this who’s who of Kyrgyz politics looking up the new Interior Minister, but it seems generally pretty worth paying attention to.

GANTT

Embedded in a project that’s floundering a little as it expands beyond the size that the devs can keep in their heads. So, looking for some relatively lightweight, way of visualizing the moving parts and the work that needs to be done. And, as every other time I’ve looked in this area, finding most solutions to be too feature-light, too complicated, or sometimes both.

First are the project scheduling systems. Whatever they focus on, it’s hard to think of them except as tools for generating GANTT charts. I can imagine these being useful for, say, a big construction project with complex interdependencies of people and machines. For coding, not so much. Particularly not Taskjuggler, which seems to delight in being non-user-friendly. That is , it is is complicated and does a bad job of explaining itself — but then tries to use this as evidence of how sophisticated it is. I ran away before finding out; complexity is

not

what I want!

Gnome planner is quite possibly much inferior for large projects, but at least lets me add a task without hours grepping through the docs. If I ever need a gantt chart, I’ll certainly head there rather than taskjuggler. I honestly believe that coding extra features into planner as required would be easier than making sense of taskjuggler

So, I think I’ll do without!

BP oil spill

I often avoid certain news stories: not because they’re unimportant, but because I doubt I’ll learn much by discovering them in the day-by-day dribble of the daily press.

The BP Oil Spill is one: I’m not going to bother with short articles on it, but I’d really love to follow the long ones. I’ve idly watched the speculation ramp up to biblical proportions, but have no idea how to interpret it.

[no content here, as you can see, just a stick in the ground to note how shameful it is that I know nothing about this]

MongoDB

MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That’ll change soon; we’re over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they’ve eagerly implemented in recent months, noticing that they won’t solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.

So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.

The simplest way I could find of doing this is:

db.loglines.group({ ‘cond’ : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});

Or, the version in pymongo:


> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
out[doc.country] = 0;};
out[doc.country] += 1;};
"""

> l.group(key = {},
condition = {},
initial = {'total':0},
reduce = reduce_func)
[{
u'AE': 215.0,
u'AG': 23.0,
u'AM': 140.0,
u'AN': 58.0,
u'AO': 56.0,
...
u'total' : 87901;
}]

[apologies for formatting; I’ve not really figured out how to edit js within a python repl]

more gaga

More on Alejandro:

– Bad Romance may have been similarly intricate

– The dreamscape reminds me strongly of Gaiman, although that probably means no more than that Gaiman’s been on my mind lately

– Everybody seems to have seen the religious elements as a homaget to Madonna, with “like a prayer”. Fair enough, but it surely also has some connection to Derek Jarman’s video for the pet shop boys’

It’s a Sin

s