(This was a student project proposal, which ended up as the undergraduate diploma thesis of Fabian Pacheco, published in NAACL 2012 )

The field of Natural Language Generation (NLG) has put forward through the years a number of algorithms for the generation referring expressions (GRE, see Wikipedia for a discussion on the topic).

Similarly, the field of Natural Language Understanding (NLU) has studied the problem of anaphora resolution. This problem is tackled through a number of techniques, some of which make use of trainable systems, for which there have been annotation efforts of coreference chains.

In this project, the student will seek out and extend existing resources for training anaphora resolution with ontological information obtained from large scale ontologies. Tools similar to COREFDRAW (Postscript GZipped) can be used to annotate the linking. The extended resources will then be used to test existing algorithms for GRE from the literature.

A good starting training material could be the data from the Anaphora Resolution Exercise and to use the dbPedia Ontology, but the student is of course to experiment and seek out whatever it is best for the proposed goals. Also SemEval2 data