Saving firefox

further to earlier grumbling about firefox, it seems the main culprit is the restore-session facility. This is something I hated anyway, even without realising that it was shutting down my hdd every 10 seconds to churn through all my tabs. Solution: go to about:config. browser.sessionstore.interval controls how often firefox stores its tab data. The default is 10 seconds; setting it to a long string of nines has sped up my computer no end.

and so, firefox is saved for another day.

also of note: the vimkeys plugin, providing j/k scrolling.

Webmontag 19.7.10

At Web Monday. Presentations:

  • First Trimester, blogging for doctors. Apparently while there are a lot of web projects targetting patients, there aren’t many blogs aimed at providing professional information for doctors.
  • Yourcent, a micropayments system
  • Feed Magazine a free german-language (paper) magazine about the online world. Now at issue 0

Nested dictionaries in python

Python’s defaultdict is perfect for making nested dictionaries — especially useful if you’re doing any kind of work with json or nosql. It provides a dict which returns a default value when a key isn’t found. Set that default value an empty dict, and you have a convenient dict of dicts:

>>> from collections import defaultdict
>>> foo = defaultdict(dict)
>>> foo['x']

But it breaks down when you go more than one layer deep:

>>> foo['x']['y']
Traceback (most recent call last):
File "", line 1, in 
KeyError: 'y'

You can get another layer by passing in a defaultdict of dicts as the default:

>>> bar = defaultdict(lambda: defaultdict(dict))
>>> bar['x']['y']

But suppose you want deeply-nesting dictionaries. This means you can refer as deeply into the hierarchy as you want, without needing to check whether the intermediate dictionaries have already been created. You do need to be sure that intervening levels aren’t anything other than a recursive defaultdict, mind. But if you know you’re going to have your content filed away inside, say, quadruple-nested dicts, this isn’t necessarily a problem.

One approach would be to extend the method above, with lambdas inside lambdas:

>>> baz = defaultdict(lambda: defaultdict(lambda:defaultdict(dict)))
>>> baz[1][2][3]
>>> baz[1][2][3][4]
Traceback (most recent call last):
File "", line 1, in 
KeyError: 4

It’s marginally more readable if we use partial rather than lambda:

>>> thud = defaultdict(partial(defaultdict, partial(defaultdict, dict)))
>>> thud[1][2][3]

But still pretty ugly, and non-extending. Want infinite nesting instead? You can do it with a recursive function:

>>> def infinite_defaultdict():
...     return defaultdict(infinite_defaultdict)
>>> spam = infinite_defaultdict() #defaultdict(infinite_defaultdict) is equivalent
>>> spam['x']['y']['z']['l']['m']
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})

This works fine. The __repr__ output is annoyingly convoluted, though:

>>> spam = infinite_defaultdict()
>>> spam['x']['y']['z']['l']['m']
defaultdict(, {})
>>> spam
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'x':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'y':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'z':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'l':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'m':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})})})})})})

A cleaner way of achieving the same effect is to ignore defaultdict entirely, and make a direct subclass of dict. This is based on Peter Norvig’s original implementation of defaultdict:

>>> class NestedDict(dict):
...     def __getitem__(self, key):
...         if key in self: return self.get(key)
...         return self.setdefault(key, NestedDict())
>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}

Kyrgz miltias

Not a good sign:

Amid the early April tumult that brought down former President Kurmanbek Bakiyev’s administration, young men in Bishkek and other cities began forming druzhiniki groups to patrol the streets and restore order. These groups were originally envisioned as a temporary solution to security challenges. But in the ongoing unrest that has plagued Kyrgyzstan since April, militia groups have kept on amassing influence. [Eurasianet

Temperature and clothing: a little project I’ll never find time for

I hate hot weather passionately. Or more accurately, lethargically: when the temperature goes above 25, I find myself unable to concentrate on anything.

But now I find myself wondering: what makes people dress down in the heat? Do they choose their clothing based on today’s weather, yesterday’s weather, or some combination of the two?

Fortunately, we have the data and technology to answer this. I’m not going to


it (see: lethargy). But here’s what I would do, if I had the time/energy — and perhaps I will when autumn comes and I start to wake up.

Skin-detection algorithms already exist. This is the only freely-available code I could find for the purpose; I haven’t tested it.

You’d also need a source of images, tagged by date and location. Flickr will probably give you that, if you choose the right tags to narrow it down to full-body portraits of people. You can get weather information from The US National Weather Service, although it’s not clear what historical data is available. Failing that, you could limit photos to a particular group of dates/locations, for which you manually look up the historical weather. Then just assemble the data, and run some regressions.

Oxford comma

“For my parents, God and Damien Katz.”

— Noah Slater’s dedicaiton, in the O’Reilly couchdb book

git pull –rebase

Useful post on git by Yehuda Katz. inter alia, strongly suggests using the


flag when merging

Open Data

We’re in the midst of a data explosion. Then again, we’re


in the midst of a daa explosion. It’s been developing, wave by wave, since the first Sumerian scribe pushed his wedge into clay. Maybe it feels different this time; maybe it’s always felt different.

The past two centuries saw the gradual triumph of ordered data collection: the regimented and expensive process of the census, the time-motion study, the economic indicator. The province of powerful behemoths — government, military, corporate or the omnipresent RAND corporation — such projects were rigorously plannedat the top, then executed by a small army of functionaries.

In the last 15 years, something has changed. Quantitative change, initially: more data, faster computers, easier transmission of information. But also a change in quality. Now we’ve moved into the era of data as by-product. Our clicks and our purchases are tracked because watching us is cheap and easy, not as part of a pre-planned technocratic project. Such cheapness brings us into the age of data abundance, and we’re only beginning to appreciate the consequences and the possibilities.

Enter the Open Data movement. Bubbling with geekish idealism, this is a loose grouping of campaigners trying to prize large datasets out of government and corporate hands, bringing them into the agora. Knowledge here may be measured in SQL dumps, linked data and gigabytes of official transcripts, but the idealism fits into the standard pattern: the Truth will set you free.


getting country population data from freebase:

from freebase.api import HTTPMetawebSession, MetawebError

mss = HTTPMetawebSession('')

list(mss.mqlread([{'name': None, 'type': '/location/country', '/location/country/iso3166


alpha2' : None, '/location/statistical_region/population' : [{'number': None}] }]))

Kyrgyzstan: NYT beats WaPo

I’ve been shamefully ignoring Kyrgyzstan. Well, I’ve been ignoring politics in general, busy on vodo and other techncial work. But it’s particularly hideous to start ignoring an area you care about, just as a great many things (good and bad) start happening. Not reading about what’s happening over there makes me feel complicit in the near-total lack of attention in the Western media.

And Kyrgyzstan really is being overlooked to an incredible extent. Washington Post: nothing worth mentioning. New York Times is doing noticeably better, though — in fact, their coverage is pretty decent considering the distance, and the lack of much domestic political significance within the US.