Ghosts-Design and System Description

The base for this text is a draft written by Olle Palmgren and Daniel Pargman, dated September 27, 1993. Transcription, editing and comments by Fredrik Kilander [FK]. This version June 1, 1994.

Contents

Design

Introduction

Here we describe the design of the GHOSTS system. The GHOSTS system is intended to cut down on information overload when reading usenet news or email. It does this by filtering. The filtering is designed to work on a sequential stream of messages, as is the case of email. In the case of usenet news the situation is slightly different. Instead of a stream of messages, we have a stream of groups! Where each group in the group-stream contains a stream of messages, very much like the stream of email messages.

The actual GHOSTS system consists of four parts:

nnghost
The usenet news filter. This program is invoked automatically each time the user starts reading the modified nn program distributed together with GHOSTS. It then monitors the stream of messages within each group and applies the actions defined on messages that it recognizes.
mailghost
The email filter. It's invoked each time the user receives email. It filters the message stream before the user is notified that mail has arrived. It handles the messages in the same way as the news-filter.
ruled
The rule-editor. This program is used by the user to tell the filters what to do and to which messges.
grouped
The group-editor. This program is used to filter groups. If can also be used to subscribe and unsubscribe to groups and to get a general overview of the available news-groups. The latter in particular can be of great use to a novice user since the group structure in usenet news is huge.
The user intercats only with the group and the rule in order to define its behaviour. The filter (ghost) parts are completely transparent to the user once they are set up, and the only thing he notices of the presence is the effect of their work.

A possible advantage with GHOSTS is that a user of it can continue to work with the same email or news system that he is accustomed to. He doesn't have to relearn a new set of commands, a new interface, and so on... in order to cut down the information flow to a more managable level.

Considerations

The design has been influenced by a number of considerations, some of them are: We found that the object-oriented paradigm fit this bill to a high degree. We chose to implement GHOSTS in C++ [Str80]. Some of the factors that affected our decision were: The only parts where we had to let go of the above-mentioned design considerations were in the coding of the visual user interface parts for the group and rule editors.

The interface was constructed in Motif [Hel91]. Motif is a widget-set written in C for the construction of visual user interfaces under the X Window System. Even though it is designed in an object-oriented way, it was too much work to remodel that design in C++. Instead we incapsulate the interface part in a class and let the applications communicate with this class. This means that if another interface is wanted, only one module must be rewritten. All the interface-specific code is found in this class.

The Filter

The filter interacts with a sequential stream of messages. It reads and intercepts one message at a time from the stream. It then tries to classify the message and, if allowed, perform some actions upon the message. These actions could be to save the message in a folder, discard the message, forward the message etc. If the filter fails to classify the message it is passed unaltered to its original destination.

In order to be able to make any judgement at all about the messages the filter has a small expert system. The filter passes the message it reads from the input stream to the expert system and asks it to evaluate the message. The expert system evaluates the message according to a set of user-defined rules. The expert system is also responsible for applying actions in rules to the message.

The filter engine (the expert system at the moment) is constructed to work on a generalized model of rules and messages. This makes it possible to construct new rules, and new messages without changing the filter engine.

This is implemented in C++ using virtual base classes and inheritance. The actual rule-classes and message classes inherit an interface from their virtual base classes. They have to provide their own specific behaviour to the interface defined by their virtual base class. When the engine operates on a specific instance of a rule or message, it is the behaviour that is used. The engine sees the rules and messages as objects of the virtual base class type. It doesn't care about which actual type of rule or message it is working with.

The virtual base class implementation works fine as long as all the necessary functionality required by the filter engine is provided by the generic rule and message classes.

The News Filter (nnghost)

The news filter exists in a modified version of the nn program, a reader for Usenet News. The nnghost program intercepts all nn's requests for messages and filters them as described in section The Filter. The nnghost program is totally transparent for the user, the news server and the original parts of nn. The filter implements most of its services by using functions and message properties already provided in the news server and nn. For example:

  1. The user enters a new newsgroup (within nn).
  2. The nn program asks the news server for a message header in the group.
  3. The nnghost program intercepts the header, filters it and finds that the appropriate action is to mark the message for reading.
  4. The nnghost program gives the message to nn with the information that is was autoselected (i.e. nn believes that it was flagged by nn's original selection mechanism).
  5. Steps 1 through 4 are then repeated until all message headers in the newsgroup has been processed.
  6. The nn program displays all the message headers to the user.
What happens is that nn is kept in the belief that it interacts with the user and the news server, when in reality it interacts with the user and the nnghost process. The news server sees just another client, such as tin, mxrn, rn, gnus or nn but the client in this case was nnghost, acting as an invisible intermediary between the news reader and the news server.

The Email Filter (mailghost)

The email filter is notified as soon as there arrives mail that a message is available. (The invocation mechanism is most likely the .forward file in the user's home directory. [FK]) It then reads and handles the message, as described in section The Filter.

The Rule Editor (ruled)

This program reads a rule file, displays them as a set of rules to the user and prepares for editing. The user may browse through the rules, change, delete or add completely new rules to the set.

The interface part of ruled is written using the Motif widget set. All Motif-specific code is collected in a single interface class. This class is responsible for all user interaction. If the interface is to be changed, there is just one class to rewrite.

The rule editor is used to maintain rule sets for both the email filter mailghost and the Usenet News filter nnghost.

The Group Editor (grouped)

The group editor is an interactive, visually oriented editor for the user's personal .newsrc file. This file is used by almost all Usenet News readers to maintain the user's position in the flow of messages. The file contains entries which define the newsgroups the user is subscribing to and which messages the user has seen in each newsgroup.

The structure of the newsgroups forms a tree, not unlike Internet domain-names or filename paths. The tree structure is a way to classify newsgroups from general to specialized topics. The grouped program visualizes this tree for the user and allows him to orient spatially as well as conceptually. In particular, ruled offers the possibility of hiding uninteresting groups from view, as well as providing visual cues of the properties of a particular newgroup or class of newsgroups. The user interacts with editor through the traditional means: the pointing device and the keyboard.

Syntax for Rules and Messages

Rule Syntax

The structure of the rules is presently very simple, but it can easily be extended by adding new production rules. We can probably use the same rules for both news and email, since the format of a news message and an email is so similar. The main point of difference between the two alternatives is the actions. The actions should reflect the actions that the user may perform manually.

The rules has the following structure:

RULE		-->	"rule" String
			"if" TVS "then" ACTIONS "end"

TVS		-->	"("  TVS  ")"
TVS		-->	TVS "and" TVS
TVS		-->	TVS "or"  TVS
TVS		-->	STATEMENT "==" STATEMENT
TVS		-->	STATEMENT "!=" STATEMENT

STATEMENT	-->	String
STATEMENT	-->	COMMAND

COMMAND		-->	"field" String
COMMAND		-->	"body"

ACTIONS		-->	epsilon (The empty string? [FK])
ACTIONS		-->	ACTION ACTIONS

ACTION		-->	"save" String
ACTION		-->	"forward" String
Here's an example of a rule in the above syntax:

rule example
if (field == "C++")
then
	save "c++.folder"
	forward "friend@student.docs.uu.se"
end

Message Syntax

At the current development stage we regard the syntax of Usenet News messages as a subset of the syntax for an email message. The only significant difference being that a news message is not started by a from: line. The syntax used for email messages is simplified in accordance to [Cro81]:

"Some mail-reading software systems may wish to perform only minimal processing, ignoring the internal syntax of structured field-bodies and treating them the same as unstructured field-bodies. Such software will need only to distinguish:

The abbreviated set of syntactic rules which follows will suffice for this purpose. It describes a limited view of messages and is a subset of the syntactic rules provided in the main part of this specification. One small exception is that the contents of field-bodies consist only of text."

The syntax is as follows:

MESSAGE		-->	HEADER
MESSAGE		-->	HEADER CRLF BODY

HEADER		-->	epsilon
HEADER		-->	FIELD HEADER

FIELD		-->	FIELD-NAME : CRLF
FIELD		-->	FIELD-NAME : FIELD-BODY CRLF

FIELD-NAME	-->	Any chars except {CTLs, space and ':'}.

FIELD-BODY	-->	BODY
FIELD-BODY	-->	BODY CRLF LWSP FIELD-BODY

BODY		-->	epsilon
BODY		-->	TEXT BODY

TEXT		-->	Any chars except CR immediately followed by LF

References

Str80:missing

Hel91:missing

Cro81:missing