Ten years ago, Alexandre Patry, Jean Schurger, George Peristerakis and me participated in a Hack Reduce hackathon sponsored by Hopper travel and organized by Alexis Smirnov.

We put together quote-me-if-you-can, a project using Map/Reduce and Apache OpenNLP to do part-of-speech tagging and shallow parsing (chunking) over a full dump of https://openparliament.ca/ gratiously provied by Michael Mulley.

We extracted quoted text and constructs of the form "adjective noun". For a given noun, all adjectives that appear together with that noun form a probability distribution. If you take all the speakers in Parliament, that'd form a "background" probability distribution. For a given speaker, the adjectives they use form a speaker-specific distribution. Comparing them (using, for example, the Kullback-Leibler divergence), allows to find nouns on which a speaker is using adjectives different from the rest of speakers. Fiddling with the formulas a little bit more allows to find which adjectives contribute more to the difference.

This allows to find interesting quotes, like, for example, Mr. John Reynolds using the adjective 'fancy' to refer to 'toilets' at a speech in 20 years ago:

The government can have fancy toilets in its jets, but Canadian soldiers do not get porta-potties over in Afghanistan.


The above quote gave name to the final presentation for the hackathon. The code for that project is half-lost, sadly. It would be interesting to reproduce it and to expand it. There is no need for map/reduce anymore, the part-of-speech tagging and chunking can easily be done using spaCy in a laptop these days.