Nested dictionaries in python

Python’s defaultdict is perfect for making nested dictionaries — especially useful if you’re doing any kind of work with json or nosql. It provides a dict which returns a default value when a key isn’t found. Set that default value an empty dict, and you have a convenient dict of dicts:


>>> from collections import defaultdict
>>> foo = defaultdict(dict)
>>> foo['x']
{}

But it breaks down when you go more than one layer deep:


>>> foo['x']['y']
Traceback (most recent call last):
File "", line 1, in 
KeyError: 'y'

You can get another layer by passing in a defaultdict of dicts as the default:


>>> bar = defaultdict(lambda: defaultdict(dict))
>>> bar['x']['y']
{}

But suppose you want deeply-nesting dictionaries. This means you can refer as deeply into the hierarchy as you want, without needing to check whether the intermediate dictionaries have already been created. You do need to be sure that intervening levels aren’t anything other than a recursive defaultdict, mind. But if you know you’re going to have your content filed away inside, say, quadruple-nested dicts, this isn’t necessarily a problem.

One approach would be to extend the method above, with lambdas inside lambdas:


>>> baz = defaultdict(lambda: defaultdict(lambda:defaultdict(dict)))
>>> baz[1][2][3]
{}
>>> baz[1][2][3][4]
Traceback (most recent call last):
File "", line 1, in 
KeyError: 4
>>>

It’s marginally more readable if we use partial rather than lambda:


>>> thud = defaultdict(partial(defaultdict, partial(defaultdict, dict)))
>>> thud[1][2][3]
{}

But still pretty ugly, and non-extending. Want infinite nesting instead? You can do it with a recursive function:


>>> def infinite_defaultdict():
...     return defaultdict(infinite_defaultdict)
...
>>> spam = infinite_defaultdict() #defaultdict(infinite_defaultdict) is equivalent
>>> spam['x']['y']['z']['l']['m']
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})

This works fine. The __repr__ output is annoyingly convoluted, though:



>>> spam = infinite_defaultdict()
>>> spam['x']['y']['z']['l']['m']
defaultdict(, {})
>>> spam
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'x':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'y':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'z':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'l':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'m':
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})})})})})})




A cleaner way of achieving the same effect is to ignore defaultdict entirely, and make a direct subclass of dict. This is based on Peter Norvig’s original implementation of defaultdict:


>>> class NestedDict(dict):
...     def __getitem__(self, key):
...         if key in self: return self.get(key)
...         return self.setdefault(key, NestedDict())
>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
{}
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}

Leave a comment

Your email address will not be published. Required fields are marked *