Working with PENG

Rolf Schwitter and Marc Tilbrook

14th July 2007

 

1 Overview 

This document gives a high-level introduction to the PENG system and is written in such a way that an end user can make immediately first-hand experiences with the system. The document focuses on the usability of the system, illustrates various features of the user interface, exemplifies the gram­matical coverage of the controlled language, and discusses some limitations of the current imple­mentation.

2 The Interface

The text editor of the PENG system provides a standard mode and a web feed mode. The standard mode can be used to write specification texts in controlled natural language. The web feed mode is specially designed to annotate websites and individual web pages with controlled natural language and to augment these websites with ontological information. We will direct our attention in this re­port mainly on the standard mode of the text editor, discuss various interface aspects, and briefly sketch the functionality of the web feed mode. The user can switch between the standard mode and web feed mode using the Mode menu of the text editor (see the menu bar of Figure 1).  We recom­mend exploring the standard mode of the text editor first.

2.1 The Standard Mode

After starting up the PENG system, the user sees the text editor in its standard mode:

 

 

 

Figure 1: Text Editor in Standard Mode

 

This interface consists of a question window, a text window and a message window. The input to the text window and the question window is restricted by the controlled language processor of the PENG system. The controlled language processor generates lookahead information on the fly for each word form that the user enters while the specification is written. This look-ahead information consists of syntactic categories which predict what kind of input can follow the current word form. The lookahead categories are implemented as hypertext links (see Figure 2).

 

 

Figure 2: Lookahead Categories

 

By clicking on a hypertext link the user can access help information about an approved word class (see Figure 3). 

 

 

Figure 3: Word Class Help

 

The available words of these approved word classes can be inserted into the text at the current cursor position. The lookahead categories are active by default but the experienced user might want to switch them off in the View menu (see Figure 4) or alternatively display them in the message window.

 

 

Figure 4: Lookahead Categories

 

Instead of typing an approved word form into the text editor, the user can alternatively select a word form from a context menu. The user can launch the context menu by clicking on the left mouse but­ton and select a suitable word form from a hierarchical menu (see Figure 5).

 

 

 

Figure 5: Context Menu

 

Please note that the same lookahead categories (in our case: proper noun, determiner, cardinal, connec­tive) are available in the context menu as in the text window where they are displayed as hypertext links (see Figure 2 for details). Once an approved word form has been selected from the context menu, it will be inserted automatically into the text window and text processing is immediately re­sumed (see Figure 6).

 

 

 

Figure 6: Inserting a Word Form via the Context Menu

 

Not only can approved word forms be selected and inserted in this way, but also all noun phrases which are accessible in the specification text. Accessible noun phrases are collected during parsing and then displayed in the context menu (see Figure 7).

 

 

 

Figure 7: Accessible Noun Phrases

 

These noun phrases can be inserted in a similar way as approved word forms (see Figure 8).

 

 

 

Figure 8: Inserting an Accessible Noun Phrase

 

The user can display various kinds of output information about the processed text in the message window. Using the View menu, the user can display a paraphrase which indicates anaphoric refer­ences and synonyms, a syntax tree for each sentence, the discourse representation structure for the emerging specification text and the equivalent first order logic representation. Additionally, the out­put (i.e. the entire proof or model) of the reasoning engine can be displayed as well as the result of the reasoning process (for example the answer to a question). See Figure 9 for an overview of the available options:

 

 

 

Figure 9: View Options

 

Figure 10 shows the output in the message window for the simple sentence The secretary works on Monday with all the above-mentioned view options selected.

 

 

 

Figure 10: Message Window

 

The PENG system features two reasoning services (parts of these reasoning services are still under development). In the standard mode question answering, consistency checking and informativity checking is fully implemented for Otter/Mace and question answering and consistency checking is implemented for Satchmo. Please note that informativity checking is not yet implemented for Satchmo. 

 

Figure 11 shows that automatic consistency checking is always selected by default. That means that after each sentence which the user enters a consistency check is executed. The reasoning service checks whether the entire specification is consistent or not. This is a costly task, since the system cre­ates a new model from scratch each time a new sentence is added to the specification text. Automatic consistency checking can be switched off and can be executed manually by the user – whenever re­quired. The user can select the preferred reasoning service via the Tools menu.

 

 

 

Figure 11: Reasoning Service

 

The best way to get a first idea about the coverage of the controlled natural language PENG is to have a closer look at the corpus of test sentences and questions which is available via the Tools menu. These sentences are classified into the following subcategories (see Figure 12).

 

 

 

Figure 12: Test Sentences

 

If the user chooses, for example, the subcategory "Consistency", the following set of sentences is dis­played (see Figure 13). Once the user selects a sentence, the sentence is automatically copied into the text window and processed (depending on the selected option of the reasoning service).

 

 

 

Figure 13: Test Sentences (Consistency)

 

Part of the text editor is a lexical editor for adding userspecific content words. If the author enters a content word (i.e. proper noun, common noun, verb, adjective or adverb) into the text editor which is not yet available in the lexicon and is not in the list of illegal words, then this content word needs to be added to the user lexicon of the PENG system.  The lexicon editor is accessible via the Tools menu (see Figure 14).

 

 

 

Figure 14: Calling the Lexical Editor

 

The interface to the lexical editor has been designed in such a way that only minimal linguistic knowledge is required by the user to add a new content word to the lexicon (see Figure 15). As soon as a new content word is available in the lexicon, the parsing process is resumed. User-defined con­tent words can also be deleted from the user lexicon. However, the user cannot delete words in the base lexicon of the PENG system which contains (mainly) the most frequent 3000 content words of English as well as all predefined function words. 

 

 

 

Figure 15: Lexical Editor

 

Please note that if the user enters an unknown word into the text editor, then the system tries to ap­ply a number of simple spelling heuristics and makes suggestions for spelling correction (see Figure 16):

 

 

 

Figure 16: Spelling Errors

Note that it is up to the user to decide whether a word from is misspelled or not and to correct it - if necessary - or to add it to the user lexicon.

2.2  The Web Feed Mode (Work in Progress)

When the user selects the web feed mode, the text editor asks if the current user lexicon should be used for the new task or if a new user lexicon should be created. Once selected, the text editor dis­plays the interface of the web feed mode (Figure 17).

 

 

 

Figure 17: Text Editor in Web Feed Mode

 

This interface has a tabbed pane containing an ontology pane for the specification of the ontological (background) knowledge about a website and one or more summary panes for the description of those individual web pages which are part of the website. Below the tabbed pane there is a message window for the system feedback and above the tabbed pane is a question window for asking ques­tions about various aspects of a web feed specification.

 

The ontology pane contains a title field for the name of the web feed, a link field for the URL to the website that corresponds to the channel, and an additional lexicon field for the URL that points to the (exported) user lexicon of the controlled natural language. For example, the following conditional sentence expresses domain-specific ontological knowledge about a website and can be placed into the description window of the ontology pane:

 

If X works then X succeeds.

 

Each summary pane contains a title field for the name of the web page, a link field for the URL which points to the original web page and a description window for the summary of a web page. Let us il­lustrate the use of the web feed mode with the help of a very simple setting consisting of an ontology pane and two summary panes. The ontology pane contains background information as the above-mentioned sentence, the first summary window contains - among other information - the sentence

 

Bill Smith works at Macquarie University.

 

and the second window contains the sentence

 

            Mary works at the DSTO.

 

The question 

 

            Who succeeds?

 

will result in two proofs and two answers will be extracted from this proof as Figure 18 illustrates.

 

 

Figure 18: Question Answering in the Web Feed Mode

 

Please note that the web feed mode is work in progress and question answering and consistency checking is currently only partially implemented in this mode for Otter/Mace and Satchmo. Infor­mativity checking is currently not available in this mode.