To generate natural language from computational representations, a number of processes
must be carried out. Part of the process called sentence planning is the task of
aggregation. Aggregation, which has been called ellipsis or coordination in Linguistics,
is the process which removes redundancies during generation of a natural language
discourse without losing any information, thus making text easier to read. People
perform aggregation all the time without thinking about it. The content of computational
representations such as real-world databases, knowledge bases, database models, or
formal specifications is often highly redundant and needs aggre gation before these
representations can be successfully paraphrased into natural language.
When we plan to say something in natural language, we organize the content in such
way that we do not need to repeat anything. Instead of saying something complicated
as:
John's bicycle is red
Mary's bicycle is yellow
Tom's bicycle is blue
Lisa's bicycle is red
we aggregate the information and say instead:
John and Lisa have red bicycles.
Tom's and Mary's bicycles are blue and yellow respectively.
In Dalianis' Ph.D thesis Concise
Natural Language generation from Formal Specifications a set of aggregation rules
are formulated which avoid redundancies in generated text.
The aggregation rules are implemented in a natural language generator called ASTROGEN - Aggregated deep and Surface naTuRal language GENerator.
This work has been carried out in the project Concise Natural Language generation from Formal Specifications.