Aggregation in Natural Language Generation

Hercules Dalianis & Eduard Hovy

Abstract

In this paper we address the problem of redundancy in text generation. Redundancy typically occurs when the material selected for communication contains information that is duplicated in the text, or else is so closely related that the reader can automatically infer one piece when reading another. Such redundant material is invariably removed by people, and ought to be removed by generator systems, to produce better quality text. We call the process of removing redundancy aggregation. In addressing the problem, three questions arise: Why do people object to redundancy? Which redundant portions are best removed? What mechanisms or rules are used to remove redundant information? In this paper we begin to answer the third question by identifying and describing the aggregation processes generators can use. We first survey the studies we have found on aspects of aggregation. We next outline a study we performed with human subjects. Finally, we define and describe eight aggregation strategies we identified, and discuss several associated issues and open questions.

pdf