Summaries for GIVE 1

This GIVE challenge was with tile-based movements. The best systems were Madrid and Union, with Madrid using less instructions. Austin was the quickest in time needed to reach goal.

The GIVE-1 Austin System

by David Chen and Igor Karpov.

Step-by-step system as provided by the organizers made less repetitive. Replaced planner with A*. Aggregated movement instructions. Timer to avoid "silence on the line".


by Daniel Dionne, Salvador de la Puente, Carlos Leon, Raquel Hervas, Pablo Gervas from Universidad Complutense de Madrid, Madrid, Spain.

It is an addendum to A Model for Human Readable Instruction Generation Using Level-Based Discourse Planning and Dynamic Inference of Attributes by Daniel Dionne; Salvador de la Puente; Carlos León; Pablo Gervás; Raquel Hervás as presented in ENLG'09. I summarize both papers here.

Focus on NLG, "human" quality and expressions for the directions. Virtual Guides in virtual environments as a first stage in Virtual Guides in real environments.

They mention guides intervene when they realize the person is lost or at risk.

The model the world as a hierarchy of spatial levels, these levels help their message generation (using the Dale&Reiter book nomenclature). Their concept of hidden references (e.g., "corners") is also in line with that.

Kelleher and Kruijff (2005) present a spatial contexts adapted version of the work of Dale and Reiter (1992). CORAL (Dale and Geldof, 2003) contains an architecture for giving instructions.

  • How to build higher level representation of the world
  • How to generate higher level instructions
  • References using referent agents

Discourse planning is done at two levels of detail:

  • Full plan
  • Next turn

Their architecture has the following components:

  1. world analysis (expanding the raw world data with concepts like hallways)
  2. instruction tree (with levels of abstraction)
  3. instruction planner (fielding instructions and perusing the next two components)
  4. disambiguation (odd name for GRE)
  5. alerts (not only alarm tiles, but also distractor buttons and wrong doors)

Their expanded world contains:

  • list of rooms
    • type
    • perimeter
    • number of walls
    • corners
  • interconnected graph of rooms
  • list of computed objects
    • doors
    • intersections
  • current user room
  • past user room history

Their plan: "do something in this room, then move to other room". GIVE plan: "left, left, forward, foward, etc". They basically ignored the GIVE plan for the most part, using only the switch manipulation information.

Their plan is a tree, where each level has different level of abstraction. The instructions has pre- and post- conditions (instructions whose post- is valid, are removed from the tree).

GIVE Submission Algorithm 1 is what they use to define rooms.

To generate richer messages, focusing on higher level constructs (e.g., "center of the room"), some of which are dynamic in nature ("the table across the pillars"), they used reference agents. These agents produce new attributes about objects in the world that can later be used for GRE.

They have four instruction levels:

  1. only one node: "take the trophy"
  2. changing rooms and manipulating objects
  3. directional changes (for special situations, such as U-shaped rooms)
  4. original GIVE plan (hardly used)

(if the system can't find a good instruction after descending to the lowest possible level, it replans)

They define five types of instructions:

  1. Movement (from tile to tile) pre: being in starting tile, pos: being in end tile
  2. CheckPoint (from here to checkpoint, extra points in u-shaped rooms) pre: checkpoint is reachable, pos: next checkpoint is reachable
  3. Room2Room (from room to room, using a particular entrance) pre: being in starting room, pos: being in end room
  4. Action (interact with an element) pre: see the element pos: interaction fulfilled
  5. Goal (special action, final goal)

They introduce the concept of guiding agents to balance warnings, high level instructions and low level instructions. Each agent returns a value in 0,1 specifying how important is their communication (0 means the situation is highly unlikely).

They define three types of agents:

  1. information agents (interesting spots, not implemented)
  2. status agents (user status, e.g., is the user lost due to inactivity, is the user so beyond where s/he should be that can considered lost?)
  3. area agents (special areas, including warnings)

Concept of security area, area where the user should be and if s/he is outside of that, then it is considered lost seems useful and interesting.

Authors mention the need for feedback in their system. For example, stop using certain instructional strategy if the user fails to accomplish the goals when used.

References of note

  • Robert Dale and Sabine Geldof. 2003. Coral: Using natural language generation for navigational assistance. In Proceedings of the 26th Australasian Computer Science Conference.
  • Laura Stoia, Donna Byron, Darla Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL.
  • Sebastian Varges. 2005. Spatial descriptions as referring expressions in the maptask domain. In Proc. of the 10th European Workshop on Natural Language Generation.

Two approaches to GIVE: dynamic level adaptation versus playfulness

by Roan Boer Rookhuiszen, Michel Obbink, Mariet Theune, Human Media Interaction University of Twente, Enschede, The Netherlands

Three levels of generated instructions:

  1. one instruction at a time
  2. combination of a walk and action
  3. combination, but referring to objects not currently visible

Instructions are generated using templates.

GRE uses color or color and location if color is not enough (clearly that has shortcomings).

The system chooses which level to use dynamically. It checks the number of actions successfully performed in the last 5s, if they exceed a threshold, then it goes up a level. It is switched down if the number of actions is low of the H button has been pressed. Table 1 contains their thresholds. Their system is the only one that didn't show a difference in language skill, due to this adaptive capability.

Player satisfaction is related to game challenge, user level of control and freedom (Sweetser and Wyeth, 2005).

They try to approximate 'entertainment value' for their approaches using cancellation frequency from the logs and the play again question in the exit interviews. No difference between their 'playful' submission and their 'serious' submission was found.

References of note

  • Stefan Kopp, Bernhard Jung, Nadine Leßmann and IpkeWachsmuth. 2003. Max - A Multimodal Assistant in Virtual Reality Construction. KI Kunstliche Intelligenz, 4(3): 11–17.
  • Ken Newman. 2005. Albert in Africa: Online Role-Playing and Lessons from Improvisational Theatre. ACM Computers in Entertainment, 3(3).
  • Penelope Sweetser and Peta Wyeth. 2005. Game-Flow: A Model for Evaluating Player Enjoyment in Games. ACM Computers in Entertainment, 3(3).

Landmarks in Navigation Instructions for a Virtual Environment

by Kristina Striegnitz (Union College, Schenectady, NY, USA) and Filip Majda (Czech Technical University, Prag, Czech Republic)

Focused on using landmarks to give directions.

Dynamically switches between landmark mode and dummy mode depending on behavior.

In landmark mode, the system looks at the next object to manipulate, if visible, it gives the instruction, otherwise it finds a visible object to use as landmark for the move.

GRE uses the work of Areces et al. (2008).

Realization uses templates.

No changes regarding level of English and familiarity with video games.

Low values in clarity, navigation instruction, informativity and timing.

Error analysis shows that in some cases the target landmark was not distinguishable. Also, leading the user to a button that shouldn't be pressed is a bad idea. The replanning on mistakes was done too hastily, with users correcting themselves, but the system had already complained and went into planning mode.

Stoia et al. (2006): lead the user to a position where the referring expression is simple (vs. saying now a complex referring expression).

References of note

  • C. Areces, A. Koller, and K. Striegnitz. 2008. Referring expressions as formulas of description logic. In Proceedings of the 5th International Natural Language Generation Conference, Salt Fork, OH.
  • L. Stoia, D. Byron, D. Shockley, and E. Fosler-Lussier. 2006. Sentence planning for realtime navigational instruction. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NY, NY.