Concise Natural Language Generation from Formal Specifications

Hercules Dalianis

Abstract

In natural language generation a computer automatically creates natural language, e.g. English, Chinese, or Greek, from a computational representation. One use of Natural Language Generation is to describe software systems. Formal specification is a method to describe computer system for software development purposes. Most people do not understand formal languages, but they understand natural languages, therefore it is desirable to have a tool which automatically generates natural language from a formal specification. To generate natural language from computational representations, a number of processes must be carried out. Part of the process called sentence planning is the task of aggregation. Aggregation, which has been called ellipsis or coordination in Linguistics, is the process which removes redundancies during generation of a natural language discourse without losing any information, making text more fluent and easily read. People do aggregation all the time without thinking about it. The content of software engineering tools, data bases and expert systems, etc., is often highly redundant and needs aggregation before it can be successfully paraphrased to natural language. This thesis addresses various aspects of aggregation. When do we need to carry out aggregation? What types of aggregations are there? Are there any general rules for how to aggregate? How are the rules related to each other? Aggregation may give rise to ambiguities: How can we solve them? How is aggregation related to the other generation processes? In this thesis we develop the concept of aggregation and provide a number of solutions to these questions and refer to the growing literature on the problem. This work contributes to a novel part of the sentence planning phase of natural language generation.