Last Compiled: August 11, 1998.

Project Title

VOLVEX - Validation Of Specifications by Natural Language Generation for VOLVO expressed in STEP/EXPRESS.

Funded by

Volvo Research Foundation, Volvo Educational Foundation and Dr Pehr G Gyllenhammar Research Foundation.

Project Leader

Dr. Hercules Dalianis

email: hercules@dsv.su.se

Staff

Anders Hedman Check Anders' VOLVEX Homepage:

Maria Bergholtz

Dr. Paul Johannesson, (Docent), Scientific Advisor

Dr. Eduard Hovy, Scientific Advisor, (ISI/USC)

Duration of Project

Three and Half Years.

Project period July, 1996 - December 31, 1999.

VOLVEX project first year 1996/97 report

VOLVEX project second year 1997/98 report

VOLVEX project third year 1999/99 final report and VOLVEX handbook

Summary of Project (aim-method-importance)

The aim of this project is to develop a natural language generation system for validation of specifications expressed in the EXPRESS/STEP standard (Specially from specifications using the AP 214 Core Data for Automotive Design Process).

The aim is to study formal specifications, schemas as well as instances, expressed in EXPRESS/ STEP and propose Natural Language descriptions for them. In order to validate our results we will interview Automotive Designers and Constructors to obtain the "correct" natural language expressions describing the formal specifications.

We will implement a Text-and Sentence Planner and a Natural Language Surface Grammar and Lexicon in Prolog for the generation of Natural Language (English). We will also investigate in which current EXPRESS/STEP tools our generation tool could be integrated.

We will investigate other domains, Application Protocols (APs), e.g. for ships, electrotechnical plants etc, and propose guide lines for how to create a lexicon for other domains with minimal work by reusing the results from our work. The Text-and Sentence Planner and the Natural Language Surface Grammar will be similar in other domains (APs).

The importance of this project is that we will create a support tool for the EXPRESS/ STEP standard which will help designers and constructors of cars to understand their EXPRESS specifications by reading Natural Language output. This support tool will not only help designers and constructors but also other persons involved in the car design process to validate the formal specification by reading it in natural language. The natural language generation is important since not all persons are knowledgeable in the EXPRESS language.

One strength of this project is a synergy effect: that other domains (other APs) in the STEP/EXPRESS world could make use of our results and our guide lines to easily create natural language generation systems.

1. Previous research in Natural Language Generation from Formal Specifications and the STEP/EXPRESS world

Some work has been carried out in this area previously but there has not been any connection between the Natural Language Generation (NLG) community and the Formal Language community. The two communities have to a great extent worked independently.

One of the first attempts to make sentence generation from a conceptual representation is described in [Goldm75]. The next effort was to translate first order logic formulas to natural language by [Chest76]. Arguments to use natural language generation for validation of formal specifications are presented in [Swart82]. A set of translation rules for translating entity relationship diagrams to natural languages (NL) was defined in [Chen83]. Other approaches for natural language generation for validation of formal specifications expressed in conceptual models are proposed in [Rolla92]. A suggestion to generate a whole NL-discourse built on Hobbs coherence relations [Hobbs85,90] for validation of a conceptual model has been made in, [Dalia92a,92b] and a refinement of the generation by using aggregation rules has been suggested in [Dalia93, 95c,95d,96b].

Examples on complete support tools for validation and specification with graphics and also possibilities to execute the formal specification is e.g. AMADEUS, [Black87], which uses a combination of graphics and single sentence parsing and generation. Other tools are MGI (MOLOC Graphical Interface) and MOLOC [Johan91] which is used at Stockholm University for educational purposes. WATSON, [Kelly91], is used for formal specification of telephone switches. WATSON can read informal natural language scenarios and from these create a formal specification, execute the specification with both simulation- and theorem proving techniques. Yet another tool is AT&T's Visionnaire [Henju91]. However none of the support tools mentioned above has any natural language generation component except of AMADEUS mentioned previously and VINST [Breta95]. VINST's NL-generator is described in [Dalia95a].

Within the NUTEK (Swedish National Board for Industrial and Technical Development) supported project Concise Natural Language Generation from Formal Specifications under the contract P3672-1, in collaboration with Ericsson Utvecklings AB (former Ellemtel Utvecklings AB) we have developed methods [Dalia95b] for paraphrasing the instances expressed in the Delphi formal language [Höök93,Ridle94]. The Delphi language is a conceptual modelling language extended with First Order Predicate Logic. The Delphi formal language is used for expressing the functionality of telephone services.

It has been shown within Ericsson, that in order to reduce lead-times in the sales and production process, it is necessary to comprehend early in the requirements engineering process the requirements of a customer. These requirements can be elicited by means of a tool, where the customer and the salesman together specify the functionality of a telecom service and translate it to a formal language, Delphi. The natural language generation from the Delphi specification can be used during the whole sales, requirements engineering and constructing process and even in the tutorial process to inform users at various levels about the functionality of the specification [Engst92].

EXPRESS is a currently a static data modelling language [ISO-91], and provides constructs such as entities, relations, attributes, etc. EXPRESS is part of STEP (STandard for the Exchange of Product model data) [ISO-94] within STEP there are Generic Resources which are domain independent and Application Protocols (AP) which are domain specific. The APs are expressed in the EXPRESS language. An AP describes the processes, information flow and functional requirements of a specific application, an example on an AP is AP-214 Core Data for Automotive Mechanical Design Process, (ISO-10303-214). There are lot of developed tools for the EXPRESS language but non have tried yet to generate natural language from any EXPRESS specifications, [Gitti94].

2. Aim, Methodology and Preliminary Investigations and Findings

The aim of the project is to develop a tool for generation of natural language (English) from an EXPRESS specification in the Automotive domain (Application Protocol AP214) and to develop general methods for how to make similar tools for other domains (other Application Protocols). We will use the Volvo PDMI_1 EXPRESS model over the requirements from Volvo Data Corporation on the AP214.

Methodology

The methodology of this project will be combine empirical and theoretical approaches. As the first part of the project we will ask presumptive users at car design departments, about their need for natural language descriptions of their specifications. We will obtain EXPRESS specifications from the car design department at e.g VOLVO, and try to paraphrase them manually to natural language as well as letting our users propose natural language texts describing the specifications. Per Brorson at Volvo Data Corporation will provide us with EXPRESS specifications. The proposed texts will be collected by sending out interview forms to our users and asking them to manually paraphrase the specifications into natural language.

The Application Protocols, specifically the AP 214 Core Data for Automotive Design Process within the STEP/EXPRESS world , will be a crucial component in this work.

The concepts used there will be implemented in our base dictionary and then will each EXPRESS specification contribute to new words and expressions to be implemented.

We will write a generation grammar and dictionary in DCG (Definite Clause Grammar) format [Clock80].
One possibility is to use a ready surface grammar e.g. FUF (Functional Unification Formalism) och SURGE [Elhad92] which is implemented in LISP. A basic query interface will be designed as part of the content selection process of the natural language generation system.

The second part of the project will be to generalize our work to see if our approach is feasible for other application protocols (APs) for e.g. ships and electrotechnical plants and write guidelines of how to create natural language generation systems for the other APs, and also see if there are EXPRESS tools which could make use of a natural language generation system.

Theories

The proposed texts will then be analysed with both discourse [Mann84,88] and aggregation analyzing [Dalia93,95c,96a,b] methods.

Similar studies on proposed texts of conceptual models and formal language have been made in [Dalia92a,92b,93,95c,96a,b], where prototype tools where designed in [Dalia92a, 92b,95a,95b,95c] respectively. A discourse is a set of coherent natural language sentences (a text) and a discourse theory is a theory of how a set of sentences are related to each other in a discourse.

The aim of aggregation rules [Dalia93, 95c, 96a,b] are to remove redundant and repeated text in discourses but keep the content in the discourse.

e.g one simple example on aggregation.

John has a car.

Mary has a car

John's car is red

Mary's car is red

Aggregation =>

John and Mary have red cars.

We will use this technique on the EXPRESS domain but also try to extract new aggregation rules from the answers of the interview forms.

The tool will be designed according to the technique with text and sentence planners in [Dalia95a,95b,96a,b] and [McKeow88].

If we take a slightly simplified view of the text generation process as a pipeline of three stages: Text planning (which determines the content and overall discourse structure of the text material), and is followed by sentence planning (which decides on the sentence structure and scope), which in turn is followed by the surface generation which is the surface form realization (which is based on syntax) and lexical selection.

Validation of results

Validation of the results will be carried out by publishing the results in relevant international conferences and journal, presenting our results to the STEP committees and to the users of the tool at design departments.

Implementation platforms

The implementation platforms will be UNIX and SICSTUS-Prolog. We will use already developed tools within the STEP/EXPRESS world as e.g. the Xpress/TNO parser to translate the content of a EXPRESS specification from the EXPRESS format to Prolog format.

Hypothesis and questions

Our hypothesis is that the AP 214 will be of great help in designing a natural language generation tools from EXPRESS and that our work and methodology carried out here will be easy to apply to other Application Protocols in other domains, e.g. Ships, Electrotechnical Plants, Electronic Circuits. Today there are about 30 Application Protocols developed or under development. However not all the domains are relevant for generation of natural language.

Questions which will be posed and hopefully answered are: How to use EXPRESS for Natural Language Generation (NLG)? What is lacking the EXPRESS language to be used for NLG? Are the concepts used in the AP 214 easy to express in Natural language? What can be added to the AP 214 and to other APs to make NLG easier? Which EXPRESS tools could be integrated with a natural language generation component? What about parsing of natural language to EXPRESS?

Project member and Scientific Advisor

Dr. Paul Johannesson, (Docent).

International Contacts

Dr Eduard Hovy, email: hovy@isi.edu, at the Natural Language Group at ISI of the Information Sciences Institute/University of Southern California, Dr. Hovy is an international recognized researcher in the field of natural language generation and he will be one of ours scientific advisors.

Industrial Contacts

Per Brorson, Volvo Data Corporation , Göteborg, Sweden, from who we will obtain EXPRESS specifications and other relevant input for user requirements on the natural language generation tools. Per Brorson is also chairman of the MMS (Material och MekanStandardiseringen) 2169 Swedish Committee for the standard format for the communication of product data definitions. The Committee communicates directly with the STEP committees world-wide.

Duration of Project

Continuation for Three and Half Years.

Project period July, 1996 - December 31, 1999.

3 . Time plan (Time plan revised Please check revised time plan )

1 July 96 - 30 September 96:

Collecting relevant material regarding EXPRESS

Initial contact with design departments at car company Volvo via Volvo Data Corporation.

Try to find relevant EXPRESS parsing tools.

1 October 96 - 31 December 96:

Study specifications expressed in EXPRESS.

Literature search of methods and tools.

Proposing natural language texts describing EXPRESS specifications.

Initial tests of translating EXPRESS to Prolog format.

1 January 97 - 31 March 97:

Constructing interview forms with extract of EXPRESS specifications to be paraphrased to natural language by the users and sending them out to the users.

Writing first draft prototype for generation of natural language from EXPRESS

1 April 97 - 30 June 97:

Getting answers from the interviews and starting to analyzing them
Finding out general aggregation rules which can be used when generating.

1 July 97 - 30 September 97:

Using the analyses from the interviews to design the generation grammar and lexicon.

Write scientific papers for publications and to obtain input about our research at conferences.

1 October 97 - 31 December 97:

Design the text and sentence planner and a simple query interface for the content selection process.

Putting together a stable prototype of all components

1 January 98 - 31 March 98:

Testing the prototype to find out what is missing Demonstrate prototype to design departments at car companies to obtain comments.

1 April 98 - 30 June 98:

Adding comments from users to the prototype and bug fixing.

Writing scientific papers about the prototype and the preliminary findings of the project

1 July 98 - 30 September 98:

Second part of research project. Investigate other AP's to see if the technique of generating natural language is applicable. Which parts of the interviews can be removed and is it possible use the text and sentence planner, the surface grammar and part of the lexicon in the new domain (AP) ?

1 October 96 - 31 December 98:

Writing scientific papers for publication of the results from 1 July 98 - 30 September 98.

Carry out a study of all tools around STEP/EXPRESS to see if our generation tool will fit in.

1 January 99 - 31 March 99:

Write guide lines for how to create lexicons and to (re)use our natural language generation tool in other domains (APs).

1 April 99 - 30 June 99 :

Is there any natural language parser to EXPRESS language system and if there is any what is needed to add a generation component to it ?

What will be needed to create a natural language to EXPRESS tools?

1 July 99 - 30 September 99 :

This period is for all the delayed work adding up the final comments

and doing last literature search. Writing scientific papers for publication.

1 October 99 - 31 December 99 :

Writing final reports of project

This time schedule is a rough one and may be revised

4. References

Black87 W.J.Black: Acquisition of Conceptual Data Models from Natural Language Descriptions, In The Proceedings of The Third Conference of the European Chapter of Computational Linguistics, Copenhagen, Denmark 1987.

Breta95 I. Bretan et al. A Multimodal Environment for Telecommunication Specifications. In Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing, pp. 191-198,
Tzigov Chark, Bulgaria, September, 1995.

Chen83 P. P-S. Chen: English Sentence Structure and Entity Relationship Diagrams, Information Sciences 29, p.p. 127-149, 1983.

Chest76 D. Chester: The Translation of Formal Proofs into English, Journal of Artificial Intelligence, no 7 , pp. 261-278, 1976.

Clock84 W.F. Clocksin & C.S. Mellish: Programming in Prolog, Springer Verlag 1984.

Dalia92a H. Dalianis: A method for validating a conceptual model by natural language discourse generation. CAISE-92 Int. Conf. on Advanced Information Systems Engineering, Loucopoulos P. (Ed.), Springer Verlag Lecture Notes in Computer Science, no 593, pp. 425-444, 1992.

Dalia92b H. Dalianis. User adapted natural language discourse generation

for validation of conceptual models. Licentiate Thesis (SYSLAB Report No. 5). Dept. of Computer and Systems Sciences, The Royal Institute of Technology and Stockholm University, Sweden.

Dalia93 H. Dalianis & E. Hovy: Aggregation in Natural Language Generation. EWNLG-93, Proceedings of the 4th European Workshop on Natural Language Generation, Pisa, Italy 1993. Also in Trends in Natural Language Generation: an Artificial Intelligence Perspective, Springer Verlag Lecture Notes in Computer Science (forthcoming 1995).

Dalia95a H. Dalianis: Aggregation in the NL-generator of the VIsual and Natural
language Specification Tool. In Proceedings of The Seventh International Conference of the European Chapter of the Association for Computational Linguistics (EACL-95), Student Session, pp 286-290, Dublin, Ireland, March 27-31, 1995.

Dalia95b H. Dalianis: Aggregation, Formal Specification and Natural Language Generation. In Proceedings of the NLDB'95, First International Workshop on the Applications of Natural Language to Data Bases, pp 135-149, Versailles, France, June 28-29, 1995.

Dalia96a H. Dalianis: Concise Natural Language Generation from Formal Specifications., Ph.D. dissertation, (Teknologie Doktorsavhandling), Department of Computer and Systems Sciences, Royal Institute of Technology/Stockholm University, June 1996, Report Series No. 96-008, ISSN 1101-8526, SRN SU-KTH/DSV/R--96/8--SE.

Dalia96b H. Dalianis & E, Hovy: On Lexical Aggregation and Ordering. In the Proceedings of the 8th International Workshop on Natural Language Generation, INLG-96, Herstmonceux, Sussex, UK, June 13-15, 1996,

Elhad92 M. Elhadad & J. Robin: Controlling Content Realization with Functional Unification Grammars, in Aspects of Automated Natural Language Generation, R.Dale, E.Hovy, D.Rosner and O.Stock eds, Springer Verlag, pp 89-104, 1992 .

Engst92 M.Engstedt & S. Preifelt: Results from the user tests of VINST, Ellemtel Utvecklings AB, (F92 0684), 1992.

Gitti94 A. Gittinger et al: EXPRESS Tools: Esprit Project Kactus P8145 Working paper WT1, April 5 1994.

Goldm75 N. Goldman: Conceptual generation, in Conceptual Information Processing, Ed. R.C. Schank, North Holland Publishing Company,
pp.289-374, 1975.

Henju91 O. I. Henjum & O.B.H. Clarisse:Confirming Customer Expectations, in Proceedings of the National Communications Forum pp. 657-664, vol 45, 1991.

Hobbs85 J.R Hobbs: On the Coherence and Structure of Discourse, Center for the Study of Language and Information, Report No.CSLI-85-37, October 1985.

Hobbs90 J.R. Hobbs: Literature and Cognition, CSLI Lecture Notes Number 21, Center for the Study of Language and Information, 1990.

Höök93 H. Höök: A General Description of the Delphi Language. Ellemtel internal
report, 1993.

ISO-91 The EXPRESS Language Reference Manual, ISO TC184/SC4/WG5, N14, Leeds, April 29, 1991.

ISO-94 Product Data Representation and Exchange. Overview and fundamental principles, ISO TC184/SC4, ISO 10303-1, 1994.

Johan91 P. Johannesson: MOLOC: Using Prolog for conceptual Modeling, In the Proceedings of the 9th International Conference on Entity- Relationship Approach, Ed. H, Kangassalo, pp. 289-302,
North Holland 1991.

Kelly91 V. E. Kelly & U. Nonnenmann: Reducing the Complexity of Formal Specification Acquisition, in Automating Software Design, ed by M. R. Lowry & R.D. McCartney, AAAI Press, Menlo Park, California, 1991.

Mann84 W. C. Mann: Discourse Structures for Text Generation, Proceedings of the 22nd annual meeting of the Association of Computational Linguistic, Stanford, CA, June 1984.

Mann88 W.C. Mann et al: Rhetorical Structure Theory: Towards a Functional Theory of Text Organization, In TEXT Vol 8:3, 1988.

McKeo88 K. McKeown & W.R. Swartout, Language generation and explanation,
in Advances Natural Language Generation, edited by Zock M & G. Zock, Pinter Publishers Ltd1988.

Perei80 F.C.N Pereira & D.H.D. Warren: Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks. J. of Artificial Intelligence 13, 1980,
pp 231-278.

Ridle94 G.Ridley: Formal Methods for Requirement Specification - A Practical Approach using the EUA-Delphi Technology, Ellemtel Utvecklings AB, 1994

Rolla92 C.Rolland & C.Proix: A Natural Language approach for Requirements Engineering, CAISE-92 Int. Conf. on Advanced Information Systems Engineering, (Ed.) P. Loucopoulos, Springer Verlag Lecture Notes in Computer Science, no 593, pp. 257 - 277, 1992.

Swart82 B.Swartout: GIST English Generator: In Proceedings of AAAI-92, American Association of Artifical Intelligence, Carnegie-Mellon University and University of Pittsburgh, Pittsburgh, Pennylvania, 1982.