Uncategorised – Page 71

Oxford comma

“For my parents, God and Damien Katz.”

— Noah Slater’s dedicaiton, in the O’Reilly couchdb book

git pull –rebase

Useful post on git by Yehuda Katz. inter alia, strongly suggests using the
--rebase
flag when merging

Kyrgyz political biographies

Ran into this who’s who of Kyrgyz politics looking up the new Interior Minister, but it seems generally pretty worth paying attention to.

Kyrgyzstan: new interior minister

Kyrgyzstan has a new interior minister. Probably no bad thing, given that the accomplishments of previous acting interior minister Bolot Sher consisted of:

* pursuing Bakiyev’s relatives

* Making the supremely reassuring statement that “I am in command of 80 percent of the Ministry of Interior…The other 20 percent is still waffling.”

On the other hand his replacement, Kubat Baibolov, is coming straight from an oh-so-successful stint running things in Jalal-Abad

Kyrgyzstan: NYT beats WaPo

I’ve been shamefully ignoring Kyrgyzstan. Well, I’ve been ignoring politics in general, busy on vodo and other techncial work. But it’s particularly hideous to start ignoring an area you care about, just as a great many things (good and bad) start happening. Not reading about what’s happening over there makes me feel complicit in the near-total lack of attention in the Western media.

And Kyrgyzstan really is being overlooked to an incredible extent. Washington Post: nothing worth mentioning. New York Times is doing noticeably better, though — in fact, their coverage is pretty decent considering the distance, and the lack of much domestic political significance within the US.

Freebase

getting country population data from freebase:

from freebase.api import HTTPMetawebSession, MetawebError mss = HTTPMetawebSession('www.freebase.com') list(mss.mqlread([{'name': None, 'type': '/location/country', '/location/country/iso3166 1 alpha2' : None, '/location/statistical_region/population' : [{'number': None}] }]))

Open Data

We’re in the midst of a data explosion. Then again, we’re

always

in the midst of a daa explosion. It’s been developing, wave by wave, since the first Sumerian scribe pushed his wedge into clay. Maybe it feels different this time; maybe it’s always felt different.

The past two centuries saw the gradual triumph of ordered data collection: the regimented and expensive process of the census, the time-motion study, the economic indicator. The province of powerful behemoths — government, military, corporate or the omnipresent RAND corporation — such projects were rigorously plannedat the top, then executed by a small army of functionaries.

In the last 15 years, something has changed. Quantitative change, initially: more data, faster computers, easier transmission of information. But also a change in quality. Now we’ve moved into the era of data as by-product. Our clicks and our purchases are tracked because watching us is cheap and easy, not as part of a pre-planned technocratic project. Such cheapness brings us into the age of data abundance, and we’re only beginning to appreciate the consequences and the possibilities.

Enter the Open Data movement. Bubbling with geekish idealism, this is a loose grouping of campaigners trying to prize large datasets out of government and corporate hands, bringing them into the agora. Knowledge here may be measured in SQL dumps, linked data and gigabytes of official transcripts, but the idealism fits into the standard pattern: the Truth will set you free.

Protected: Does reality pass the flowchart test?

MongoDB

MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That’ll change soon; we’re over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they’ve eagerly implemented in recent months, noticing that they won’t solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.

So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.

The simplest way I could find of doing this is:

db.loglines.group({ ‘cond’ : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});

Or, the version in pymongo:


> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
  out[doc.country] = 0;};
out[doc.country] += 1;};
"""

> l.group(key = {}, 
          condition = {}, 
          initial = {'total':0}, 
          reduce = reduce_func)
[{
  u'AE': 215.0,
  u'AG': 23.0,  
  u'AM': 140.0, 
  u'AN': 58.0,  
  u'AO': 56.0,  
...
  u'total' : 87901;
}]

[apologies for formatting; I’ve not really figured out how to edit js within a python repl]

BP oil spill

I often avoid certain news stories: not because they’re unimportant, but because I doubt I’ll learn much by discovering them in the day-by-day dribble of the daily press.

The BP Oil Spill is one: I’m not going to bother with short articles on it, but I’d really love to follow the long ones. I’ve idly watched the speculation ramp up to biblical proportions, but have no idea how to interpret it.

[no content here, as you can see, just a stick in the ground to note how shameful it is that I know nothing about this]