Aggregation in Natural Language Generation


Hercules Dalianis

Abstract

The content of real-world databases, knowledge bases, database models, and formal specifications is often highly redundant and needs to be aggregated before these representations can be successfully paraphrased into natural language. To generate natural language from these representations, a number of processes must be carried out. One of these processes is sentence planning where the task of aggregation is carried out. Aggregation, which has been called ellipsis or coordination in Linguistics, is the process that removes redundancies during generation of a natural language discourse, without losing any information.
This article addresses various aspects of aggregation: When do we need it? What types of aggregations exist? Are there any general rules for aggregation? How can we solve the ambiguities introduced by aggregation? How is aggregation related to other generation processes?
The article describes a set of corpus studies that focus on aggregation, provides a set of aggregation rules, and finally, shows how these rules are implemented in a couple of prototype systems. We develop further the concept of aggregation and discuss it in connection with the growing literature on the subject. This work offers a new tool for the sentence planning phase of natural language generation systems.