Uncategorised – Page 108

Protected: 5 topics meme

OED, again

A little more on the OED. The idea of creating a publicly-accessible version has obviously been floating around for a few years. As well it might: not only would an open OED be fantastically useful, but there’s a certain justice in bringing it back to the community. As Kragen Sitaker writes, the original OED

is one of the earliest instances of what are now called “pro-am” or

“commons-based peer production” projects. From 1857 to 1928, thousands

of readers collected examples of uses of words their dictionaries didn’t

define; they mailed these examples on slips of paper to a small number

of editors, who undertook to collate them into a dictionary.

Kragen’s attempt to liberate the OED was the most effective: not only did he get one set of the OED scanned, he also cooked up some code making it possible to look up individual words. Alas, his system is now offline – such is the fate of one-man projects. Rufus Pollock’s attempt to revive it, within the framework of the Open Knowledge Foundation, seems not to have got anywhere.

More ambitious are the Distributed Proofreaders, a group who take OCR’ed books, edit and correct them by hand, and pass them on to rProject Gutenberg. They’ve been contemplating the idea of tacking the OED for some time now. But it’s a pretty daunting project – both in scale, and in the complexity of the typography – and every attempt seems to peter out.

Which is all a bit of a disappointment. I’m not quite foolhardy enough to lauch myself into digitising the OED just yet, but there must be at least some prospect to make those scans slightly more user-friendly.

The Oxford English Dictionary, free

[

update:

Here

is a very rough interface, which will be improved whenever I next have some free time

]

Using the OED online costs £200/year, which is silly. Fortunately the first edition is out of copyright, and available at the Internet Archive. Unfortunately, it’s a bit tricky to find the right volume in a format that doesn’t expect you to download 200MB to look up a word. Djvu seems the best option; you need to install a browser plugin first, but then you can look at individual pages quite easily. Here are links to each volume:

A-B, C, D-E, F-G (pdf only) , H-K, L, M-N, O-P (flip-book only), Q-R, S-SH, SI-SU (flip-book only), SV-TH, TI-U, V-Z (flip-book only)

Other formats are at these links (yes, there are two separate scans, one from the University of Toronto and another from Kragen Sitaker):

Volume 1, A-B: Sitaker
Volume 2, C: Sitaker
Volume 3, D-E: Toronto (partial), Complete?
Volume 4, F-G: Sitaker, Toronto (no djvu for either)
Volume 5, H-K: Sitaker
Volume 6A, L:A Sitaker
Volume 6B, M-NB (Sitaker)
Volume 7, O-P: Toronto (flip-book only), Unlabelled (flipbook/pdf only)
Volume 8A, Q-R: A – Sitaker
Volume 8B, S-SHB – Sitaker
Volume 9A, SH-SU: Sitaker.
Volume 9B, SV-TH: Sitaker, Toronto
Volume 10A, TI-U: Sitaker, Toronto
Volume 10B, V-Z: Toronto, Sitaker

Protected:

Lazyweb: printing

I don’t own a printer, nor do I want to. On the rare occasions when I need something on paper, I pop into a net cafe, or cajole a friend into printing it for me. This gets a bit pricy/cheeky when it involves more than a few pages. Also, it takes far too much time and faff.

Somewhere, there must exist a company that will accept documents by email, print them, and post me the results. But I’ve not been able to find any. The print shops I can find are geared either towards making glossy full-colour brochures, or to printing lots of copies of the same document.

Does this kind of print-and-post service exist? Where?

Resolver One

I’m in the frustrating position of having spent a fair chunk of last year doing cool stuff with Resolver One, a python-based spreadsheet app. Frustrating, because the manufacturers have started handing out prizes for interesting resolver-based projects — and I’m in no position to talk about, let alone open-source, most of what I’ve been doing.

Instead, I’ll just repeat that I

love

Resolver — it’s good enough to impress me despite being closed-source, and Windows-only*, and mainly aimed at financial number-crunchers in the City.

The main appeal for me is having python tightly integrated into a spreadsheet. Lambdas and list comprehensions are a natural fit for spreadsheets, to the extent that it’s painful trying to re-learn how to do things in Excel or OpenOffice. Here’s a simple example for adding up wages:

1

=SUM([row[‘

Salary

‘]

for

row

in

<Employees>.ContentRows

if

row[‘

job

‘] == ‘

Postman

‘)

But, since the expression in every cell can be an arbitrarily-complex python expression, you can do much more intricate calculations in the same idiom. Plus Reslver is built on IronPython, which gives you the ability to use .NET libraries as well as python ones. And you can export your spreadsheet as a python class, letting you plug it into any other application you fancy. And, and…

Anyway, that’s the end of enthusiasm. Back to grumbling about politics shortly, no doubt.

* I’m still a dedicated Ubuntu/Debian user. But Amazon, flexiscale or GoGrid will rent you a Windows server for ~10c/hr, so I’ve only been marginally inconvenienced by Resolver being Windows-only.

Protected:

pycallgraph

[perhaps of faint interest to python programmers, certainly not to anybody else]

I’ve just found pycallgraph, which makes it easy to visualize the structure of function calls within python code.

This is immensely useful for debugging, and for getting a feel for the structure of a program. It’s something you could theoretically already do with epydoc and a profiler, but that’s always seemed like a lot of work for little return.

With pycallgraph, it takes a couple of lines:

1

import

pycallgraph

2

pycallgraph.start_trace()

3

#…code to be graphed goes here

4

pycallgraph.make_dot_graph(‘

test.png

‘)

5

That’s still a lot of typing, though, when I really only want it for a quick picture of whatever is currently baffling me. pycallgraph have missed an opportunity to rewrite it as a decorator:

1

import

pycallgraph

2

3

def

callgraph

(filename = ‘

/tmp/callgraph.png

‘):

4

def

argwrapper

(func):

5

def

callwrapper

(*args, **kwargs):

6

if

callwrapper.already_called:

7

return

func(*args, **kwargs)

8

callwrapper.already_called = True

9

pycallgraph.start_trace(reset = False)

10

result = func(*args, **kwargs)

11

pycallgraph.stop_trace()

12

pycallgraph.make_dot_graph(filename)

13

return

result

14

callwrapper.already_called = False

15

return

callwrapper

16

return

argwrapper

This lets me generate a call-graph of any function, by popping a one-line decorator onto it:

1

@

callgraph

(‘

/tmp/callgraph.png

‘)

2

def

myfunc

():

3

#code here

The other advantage here is that I generally only want to generate the graph once, when the function is first called. Hence the use of afunction attribute, already_called, as a flag to short-circuit past the call-graph on subsequent runs.