Using Aggregation Makes Generated Text More Comprehensive

To generate natural language from computational representations, a number of processes must be carried out. Part of the process called sentence planning is the task of aggregation. Aggregation, which has been called ellipsis or coordination in Linguistics, is the process which removes redundancies during generation of a natural language discourse without losing any information, thus making text easier to read. People perform aggregation all the time without thinking about it. The content of computational representations such as real-world databases, knowledge bases, database models, or formal specifications is often highly redundant and needs aggre gation before these representations can be successfully paraphrased into natural language.
When we plan to say something in natural language, we organize the content in such way that we do not need to repeat anything. Instead of saying something complicated as:

John's bicycle is red
Mary's bicycle is yellow
Tom's bicycle is blue
Lisa's bicycle is red

we aggregate the information and say instead:

John and Lisa have red bicycles.
Tom's and Mary's bicycles are blue and yellow respectively.

In Dalianis' Ph.D thesis Concise Natural Language generation from Formal Specifications a set of aggregation rules are formulated which avoid redundancies in generated text. The aggregation rules are implemented in a natural language generator called ASTROGEN - Aggregated deep and Surface naTuRal language GENerator.
This work has been carried out in the project Concise Natural Language generation from Formal Specifications.