Referring expression generation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Referring expression generation is a subtask of Natural language generation (NLG), which involves creating referring expressions (noun phrases) that identify specific entities to the reader. A variety of algorithms have been developed in the NLG community to generate different types of referring expressions.


For example, the following text has several referring expressions

He told the tourist that rain was expected tonight in Southern Scotland.

  • He is a pronoun which refers to a person who was previously mentioned in the conversation.
  • the tourist is a definite noun phrase which identifies another person
  • tonight is a temporal referent which identifies a particular time period
  • Southern Scotland is a spatial reference which identifies a particular spatial region

Criteria for good referents[edit]

Ideally, a good referring expression should satisfy a number of criteria:

  • Referential success: It should unambiguously identify the referent to the reader.
  • Ease of comprehension: The reader should be able to quickly read and understand it.
  • Computational complexity: The generation algorithm should be fast
  • No false inferences: The expression should not confuse or mislead the reader by suggesting false implicatures or other pragmatic inferences. For example, a reader may be confused if he is told Sit by the brown wooden table in a context where there is only one table.


The simplest type of referring expressions are pronoun such as he and it. The linguistics and natural language processing communities have developed various models for predicting anaphor referents, such as centering theory,[1] and ideally referring-expression generation would be based on such models. However most NLG systems use much simpler algorithms, for example using a pronoun if the referent was mentioned in the previous sentence (or sentential clause), and no other entity of the same gender was mentioned in this sentence.

Definite Noun Phrases[edit]

There has been a considerable amount of research on generating definite noun phrases, such as the big red book. Much of this builds on the model proposed by Dale and Reiter.[2] This has been extended in various ways, for example Krahmer et al. [3] present a graph-theoretic model of definite NP generation with many nice properties. In recent years a shared-task event has compared different algorithms for definite NP generation, using the TUNA [4] corpus.

Spatial and Temporal Reference[edit]

Recently there has been more research on generating referring expressions for time and space. Such references tend to be imprecise (what is the exact meaning of tonight?), and also to be interpreted in different ways by different people.[4] Hence it may be necessary to explicitly reason about false positive vs false negative tradeoffs, and even calculate the utility of different possible referring expressions in a particular task context [5]

Other Kinds of Referring Expressions[edit]

Of course there are many other kinds of referring expressions, such as one-anaphora and event references, in addition to the ones described above. Unfortunately little research has been done on how to best generate such kinds of referring expressions.


  1. ^ M Poesio, R Stevenson, B di Eugenio, J Hitzeman (2004). Centering: A Parametric Theory and Its Instantiations. Computational Linguistics 30:309-363 [1]
  2. ^ R Dale and E Reiter (1995). Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions. Cognitive Science 19:233-263
  3. ^ E Krahmer, S van Erk, A Verleg. Graph-Based Generation of Referring Expressions. Computational Linguistics 23:53-72 [2]
  4. ^ E Reiter, S Sripada, J Hunter, J Yu, and I Davy (2005). Choosing Words in Computer-Generated Weather Forecasts. Artificial Intelligence 167:137-169.
  5. ^ R Turner, Y Sripada, E Reiter (2009) Generating Approximate Geographic Descriptions. Proceedings of ENLG-2009 [3]