Improving Statistical NLG using Unsupervised Grammars

(This is for the most part a student project proposal, suitable for an undergraduate diploma thesis and to be done jointly with Grupo PLN FaMAF)

One technique for statistical surface realization in Natural Language Generation involves using a small grammar to over-generate sentences which are then ranked using a statistical model (e.g., n-grams). In this project, the student will explore using the grammars induced from training data in an unsupervised fashion as done by Grupo PLN FaMAF. For a recent discussion on the topic on trainable surface realizers, see this paper (PDF).