Table of contents[Show/Hide]
Natural Language Generation deals with the
construction of text (pure text or annotated text then used for speech
synthesis) from data sources (or different textual sources, such a
summarization or a document in a different language
translation). As I focus mostly on generation from data, this is
the focus of this page.
The process of starting from certain data to communicate and arriving to a piece of intelligible, naturally sounding text is one of iterative expansion of the source data through a series of decisions regarding the many ways the information can be convened in text. For such decisions, a communicative intention is usually supplied along with the data, at least for non-trivial texts.
The generation process is usually split into three stages, the first one ("content planning") deals with what to say and the last two deals with how to say it ("tactical generation"). The output of the first stage is a sequence of smaller chunk units, each conveying an idea in the textual domain of choice. These units are called messages. The second stage ("text planning") is the focus of active research and involves choosing words for the different concepts in the messages, including referring expressions, like pronouns, plus joining messages into more complex sentence structures (like subordinate sentences and coordinated subjects). The output from the sentence planner is a somewhat fully specified sentence (subject, object, verb, etc). But even if the sentence is fully specified, there are still a number of decisions left for the last stage ("surface realizer"), such as ordering of constituents, conjugation of verbs, different type of agreement, etc.
Bear in mind these are my opinions and the nomenclature I used here is just the one I use myself, other NLG researchers refer to similar concepts by different names (and some might challenge the idea of a generation pipeline all along).
Why NLG within Pd? Well, I would like to see (and hear!) projects using text that is dynamically constructed by a Pd performer using the ample supply of Pd inputs (MIDI, microphones, movement sensors, etc, etc), plus the possibilities of dynamic music criticism or dynamic lyrics creation. Moreover, the possibility of having a text generator that can run in real-time allows for ideas such as semantic echo (see below) to come into existence.
Furthermore, Pd can also be of interest to NLG practitioners, some of the classic NLG systems (such as KPML work with a graphical metaphor. There will be a long way until a Pd descendant can be of use to NLG, though, as per my analysis that follows.
Last but not least, many NLG systems are written in Lisp or Scheme. Having a working NLG framework in Pd can help teaching NLG outside an AI audience.
Language is inherently recursive:
- The dog ate the bone.
- The dog that came yesterday ate the bone.
- The dog that came yesterday and I told you about it ate the bone.
- The dog that came yesterday and I told you about it when we talked in the phone, ate the bone.
Sadly, Pd has no support of recursion, nor at the level of data structures nor at the level of patches. For example, the following patch, named 'rec-test1.pd' won't execute (Pd will say "rec-test1: can't load abstraction within itself"). The reasons behind this are quite involved and has to do with the depth-first Pd execution scheduler (I think).
This lack of recursion (particularly at the data structure level) restricts the complexity of the NLG that can be done in Pd, but I have managed to verbalize simple messages and that enough allows for interesting work.
A message in traditional NLG is a hierarchical matrix (also known as a DAG). For example, something like this, from my thesis:
Due to the lack of recursion I am at the moment just representing each message as a flat attribute-value table. This situation is of course bad. Even though there are ways to deal with tree-like structures through linearization (for an example of such techniques, see Chapter 4 of my thesis) that would make the Pd patch completely unfathomable to its users. I want a NLG patch that makes sense and can be fine-tuned in Pd.
To represent NLG messages in Pd, I'm using Pd messages for each attribute-value pair, where the attribute is the first atom in the Pd message and the value is the remaining of the Pd message. Doing this allows for list values (the other alternative would be to use a single list for the whole table, but then we'll be restricted to fixed-length values).
Therefore, a NLG message such as MSG-MOTION that involves the displacement of an something (the THEME of the message) from a given place (the SOURCE of the message) to a destination (the GOAL of the message) can be represented as:
The semantics for a MOTION action can be consulted in existing ontological resources, such as a FrameNet.
The values for THEME, SOURCE and GOAL are symbols that have a meaning within the NLG patches, so that the NLG patches can reason about the text and adjust it as appropriate. For example, a symbol like HEARER as an THEME can normally be verbalized as you while SPEAKER, as I. However, SPEAKER as GOAL will normally be verbalized as me. The MSG-MOTION is not necessarily verbalized all the time with the verb to go, for example a /MSG-MOTION THEME HEARER SOURCE SPEAKER GOAL OTHER-PERSON/, which can be verbalized as you are going from me to him will be better verbalized as you are leaving me for him. We are still some way to go to get to that level of finesse in NLG for Pd, but at least we know where we want to go (pun very much intended!).
Now, how to implement NLG messages in Pd? My current solution is to have two inlets in each message patch, the hot one, just used for producing output through a bang and another receiving NLG messages split into Pd messages as mentioned before. The different attribute-value pairs are processed through a Pd route command into the lexical patches (patches that verbalize a symbol).
The message patch will react to the different slots by sending them to a lexicon patch and then concatenating the text verbalizing the different slots, plus adding functional words (like prepositions) and information not available in the slots (e.g., the verb for motion, 'to go').
The lexicon is by far the uglier component so far and help is needed to automate its creation from textual sources. At a conceptual level, it takes a symbol (such as MARKET1 or MARKET2) and produces a verbalization in the target language (e.g., "Jean Talon Market" for MARKET1 or "Mercado Norte" for MARKET2).
However, here we then start having to deal with the tricky aspects of NLG: exceptions and subtleties of language. For example, a concept, such as COUPLE1 can be verbalized as John and Mary but also the couple, with one verbalization being plural and the other singular, respectively. The choice of one over the other can depend on a number of factors (whether it has been mentioned before, for example) and it will affect the verb used to verbalize the full message. To capture this extra info, the lexicon has two outlets, one with the generated string and another with any further information to be passed along to other. The message patch should route the output outlet as required by the grammar (for example, in a particular verbalization for MOTION, the THEME is the subject and as such, it has to agree in number with the verb, so the its extra outlet is connected as a message input on the lexicon patch for the verb 'to go').
(in case you haven't notice, this patch is ugly ugly ugly. Things will improve ;-)
The first application for this technique to do NLG using Pd is Semantic Echo, as presented in PDMTL#44.