ASTROGEN - Aggregated deep and Surface naTuRal language GENerator 

Hercules Dalianis


DSV-KTH-Stockholm University
Forum 100, S-164 40 Kista
SWEDEN

ph (+46) 8 674 75 47
mob. ph. (+46) 70 568 13 59
fax. (+46) 8 703 90 25
Email: hercules@dsv.su.se

Introduction

ASTROGEN is a Natural Language Generator written in Prolog. Which hopefully can be used by almost anybody. ASTROGEN has been used for generation of natural language (English) from formal specifications and STEP/EXPRESS Specifications. ASTROGEN consists basically of two modules the Deep and the Surface generator.

The Surface generator is a DCG surface grammar (File: grammar) where the terminals are the lexical items (File: lexicon)
The Deep generator consists of a number of modules for carrying out the aggregation.

Syntactic aggregation (File: aggrules)

Bounded Lexical aggregation (File: bl_rules)

Unbounded Lexical aggregation (File: ub_rules)

Pronominalisation (File: pronoun)

The control or top level loop of ASTROGEN is the (File: tools)

Remaining files to talk about later.

:- reconsult(op).

:- reconsult(library).

:- reconsult(sorting).

:- reconsult(permut).

ASTROGEN Architecture


Figure 1 ASTROGEN Architecture

The modules within the green border (see Figure 1) are contained in the ASTROGEN system. To adapt the ASTROGEN system to any other system a translation has to be made from the other systems representation to ASTROGEN's f-structures, There is a basic lexicon avalaible within ASTROGEN but for domain specific terms a new one has to be written. ASTROGEN does not contain a real text planner but a sentence planner which is specialized in aggregation .

Download files

astrogen.zip
astrogen_unix.zip

Copyright Hercules Dalianis, ASTROGEN are not allowed to be used in commercial applications without a licence.

(Totally 15 files )

Loading

To load ASTROGEN start Prolog and consult the (File: astrogen).
If one uses SICStus Prolog check in the file ASTROGEN to change some directives and also
to consult the file sicslib.

?- consult(astrogen)

Loading ASTROGEN...

ASTROGEN loaded!

?-

Control

Toplevel predicates in the (File: tools)

paraphrase(+IN) where IN is an input of frametype, the answer is an NL string

paraphrase(+IN, -NL) where IN is an input of frametype, the answer is a list of NL

Input IN = f(T,P,Arg1,Arg2,Arg3)

IN can have optional numbers of arguments Arg.

T = {past, pres, fut} is the time predicates

P is some sort of predicate, (relation) verb E.g lexical items are, have, dials, see (File: lexicon)

Arg1, Arg2, and Arg3 are subjects (entitities) nouns, pronouns, E.g. lexical items John, subscriber, see (File: lexicon).
/* This does not work

Arg3 can be a cardinality relation represented by arg3 = card(entity,[n,m])).

E.g. f(pres,poss,person,card(child,[1,3])).
*/

IN can have the structure IN1 & IN2 & .... INn

Deep generation top level predicate is

deep(+IN,-OUT)

Deep generation is controled by the following switches

Example

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) & f(pres,state,john,busy) & f(pres,state,mary,idle)).

John is a subscriber and
Mary is a subscriber and
John is busy and
Mary is idle.
yes
?-

Switches (File: tools)

all_rules/0 switches on all aggregation rules

normal/0 switches off all aggregation rules

user_help/0. tells which the switches are

?- user_help.

These are all the switches for aggregation
One can also use set(SWITCH(no)) to remove it

  subject_pred
 predicate_do
 subject
 predicate
 sym_rel
 pronoun
 bound_lex
 un_bound_lex
 clause_comma
 canned_text
 canned_example
yes
?-

Syntactic aggregation

subject_pred/0 switches on subject and predicate aggregation (grouping)

predicate_do/0 switches on predicate and direct object aggregation

predicate/0 switches on predicate aggregation

sym_rel/0 switches on symmetric relation aggregation

?- all_rules.

yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) & f(pres,state,john,busy) & f(pres,state,mary,idle) ).

John and Mary are subscribers and
John is busy and
Mary is idle.
yes

Lexical aggregation

bound_lex/0 switches on bounded lexical aggregation (File: tools)

un_bound_lex/0 switches on unbounded lexical aggregation

surface(+IN, -NL).

IN = (T,P,Arg1,Arg2,Arg3) a frame structure which might be aggregated.

NL is a natural language list.

<>?-clause_comma.
yes

?- paraphrase(f(pres, work_action,john, monday) &  f(pres, work_action,john, tuesday) & f(pres, work_action,john, wednesday) &  f(pres,work_action,john,thursday) & f(pres, work_action,john, friday)).
John works on Monday.

John works on Tuesday.
John works on Wednesday.
John works on Thursday.
John works on Friday.
yes

?-subject_pred.
yes

?- paraphrase(f(pres, work_action,john, monday) &  f(pres, work_action,john, tuesday) & f(pres, work_action,john, wednesday) &  f(pres,work_action,john,thursday) & f(pres, work_action,john, friday)).
John works on Monday, Tuesday, Wednesday, Thursday and Friday.
yes

?- bound_lex.
yes

?- paraphrase(f(pres, work_action,john, monday) &  f(pres, work_action,john, tuesday) & f(pres, work_action,john, wednesday) &  f(pres,work_action,john,thursday) & f(pres, work_action,john, friday)).
John works on weekdays.
yes
?-



Pronominalization

pronoun/0 switches on pronominalization (File: tools)

E.g.

?- normal.
yes

?- predicate_do.
yes

?- pronoun.
yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) &
f(pres,state,john,idle) & f(pres,state,mary,idle) ).

John and Mary are subscribers and
they are idle.
yes

Sentence delimitation

clause_comma/0 switches on generation of comma delimiters of clauses

I.e. instead of generating 'and' between clauses f-structures commas are generated.

?- clause_comma.
yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) & f(pres,state,john,busy) & f(pres,state,mary,idle)).

<>John and Mary are subscribers.
John is busy.
Mary is idle.
yes
?-


?- paraphrase(f(pres,poss,john,f(pres,state,car,red))& f(pres,poss,mary,f(pres,state,car,red))). 
John and Mary have a red car.
yes
?-

Hybrid text generation (only to be used with interface)

 canned_text/0
 canned_example/0

These two above predicates switches on hybrid textgeneration.
Hybrid text generation is mixture of normal text generation (from f-structures) and canned text generation
(already ready text)
To perform hybrid textgeneration one needs to create an extra predicate.
Canned text cannot be processed by the paraphrase/0 predicate.

I.e. information which is not availables as f-structure but as canned text.

Sorting /* This does not work correctly due to introduced bugs */

Because of the implementation of the aggregation rules sorting predicate for the clauses was necessary.(This implentation could have been made different and will eventually be changed by using a stack to keep track of which clauses have been aggregated) The sorting predicate keeps track of the clauses in the discourse and decides in which order to aggregate but also in which order to generate. However the control by the user can only be for the order of generation of clauses.

Below the clause syntax

f(pres, Predicate, Subject, Object).

The clauses can now be ordered according to the keys Predicate(key 1), Subject (key 2), and Object (key 3), by giving them various priorities. The ordering rule order the clauses in a text plan according to the weights of that rule. The weights correspond to the predicate, subject and object of the clause. E.g. the 3,2,1 ordering means that the predicate has the highest priority to be ordered followed by the subject and finally the object.

Different sorting order could for example be 1,3,2 or 2,1,3 or 1,3,2.

e.g. (File: tools)

:- sort(1,2,3)

:- sort(2,1,3)

:- sort( 1,3,2).

to remove all sorting

:- sort(n,n,n).

?- all_rules.

yes

?- sort(1,2,3).
yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) & f(pres,state,john,idle) & f(pres,state,mary,busy) ).

john and mary are subscribers and
john is idle and
mary is busy.
yes

?- sort(2,1,3).
yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) &
f(pres,state,john,idle) & f(pres,state,mary,busy) ).

john is idle and
mary is busy and
john and mary are subscribers.
yes

?- sort( 1,3,2).
yes

?- paraphrase(f(pres,isa,john,subscriber) & f(pres,isa,mary,subscriber) &
f(pres,state,john,idle) & f(pres,state,mary,busy) ).

john and mary are subscribers and
mary is busy and
john is idle.
yes

?- Customization and new domains

To extend the system for new domains
Either extend lexicon and/or grammar, best way to do extension is to construct a new domain lexicon in a separate file which has to be consulted not reconsulted because that will remove all information stored in (file:lexicon)

For SICStus Prolog do not forget to declare all new predicates which are defined in many different files as multifile.
E.g. :- multifile noun/5, verb/5, adj/4,propernoun/5,conj/3,cue/3.

See file (File:lexicon) to see the syntax of the lexicon.

Add two new Prolog clauses (one for singular and for plural form) to add the noun product

noun(sing,neut,product) --> [product].

noun(plur,neut,product) --> [products].

the second argument is the gender

mask = maskulinum, fem = femininum and neut = neutrum

Then reconsult the (File: lexicon)
Add four new Prolog clauses to add the verb belong to (singular, plural, past, present and future tense)

verb(pres,sing,belong_rel) --> [belongs,to].

verb(pres,plur,belong_rel) --> [belong,to].

verb(past,_,belong_rel) --> [belonged,to].

verb(fut,_,belong_rel) --> [will, belong,to].

Special cases for attributes or states

One can nest f-structures in some cases specifically to generate attributes or states of entities. <>?-
?-paraphrase(f(pres,poss,john,f(pres,state,car,red))).
John has a red car.

yes

Cue word generation

There are possibilities to put markers on words to distinguish if they are the same instance or not e.g. subscriber/1 and subscriber/1. ASTROGEN will then augment the aggregate sentence with cue words e.g each, together, respectively.

These markers must be integers if one put the same number on two different instance then they are the same. If one put different numbers they are different instances.

This feature can be used for other purposes as well. To control the generation of sentences like the other subscriber This is not implemented yet.

These work partly, there is a newly introduced bug

?- paraphrase(f(pres,poss/1,john,book/1)&f(pres,poss/2,mary,book/2)).
John and Maryhave a book each.
yes

?- paraphrase(f(pres,poss/1,john,book/1)&f(pres,poss/1,mary,book/1)).
John and Maryhave a book together.
yes

?- paraphrase(f(pres,poss/1,john,book/1)&f(pres,poss/2,mary,pen/1)).
John and Maryhave a book and a pen respectively.

yes

?- paraphrase(f(pres,poss/1,john,f(pres,state,car,red))& f(pres,poss/1,mary,f(pres,state,car,red))).
John and Mary have a red car together.
yes


Blocking generation of lexical objects

If one puts square brackets around the objects in the f-structure the generation of these becames blocked

?- normal.
yes

?- paraphrase(f(pres,poss,john,ring_tone) & f(pres,call_action, [john],mary) &
f(pres,poss, mary,ring_signal)).

John has a ringtone and
Calls Mary and
Mary has a ringsignal.
yes

Dynamic or temporal control

Consecutive time point must block aggregation.

(Not ready yet) 


Latest update  February 14, 2005