thesaurus – Dan O'Huiginn

take a large sample of text. Run it through NLT, looking for passages with multiple adjectives describing the same noun. or, to keep it simple, just passages like a *big*, *strong* man.

For each such coincidence, record a link between the two adjectives. big and strong go together

[my initial thought was to do this geometrically. imagine an n-dimensional space, where n is the number of adjectives in the english language. Place each word at 1 in its own dimension, and for every other dimension/word at the point given by some function of how often the two co-occur.

but that seems silly. It’s more like a standard regression data-mining kind of thing.

Anyway, a project for a rainy day. And there’s still need for some usable dictionary/thesaurus based on data-mining

Leave a comment Cancel reply