Working with PENG
Rolf Schwitter
and Marc Tilbrook
14th
July 2007
1
Overview
This document gives a high-level introduction
to the PENG system and is written in such a way that an end user can make
immediately first-hand experiences with the system. The document focuses on the
usability of the system, illustrates various features of the user interface,
exemplifies the grammatical coverage of the controlled language, and discusses
some limitations of the current implementation.
2 The
Interface
The text editor of the PENG system provides a standard mode
and a web feed mode. The standard mode can be used to write specification texts
in controlled natural language. The web feed mode is specially designed to
annotate websites and individual web pages with controlled natural language and
to augment these websites with ontological information. We will direct our
attention in this report mainly on the standard mode of the text editor,
discuss various interface aspects, and briefly sketch the functionality of the
web feed mode. The user can switch between the standard mode and web feed mode
using the Mode menu of the text editor (see the menu bar of Figure 1). We recommend exploring the standard mode of
the text editor first.
2.1 The Standard Mode
After
starting up the PENG system, the user sees the text editor in its standard
mode:
Figure 1: Text Editor in Standard Mode
This interface consists of a question window, a text window
and a message window. The input to the text window and the question window is
restricted by the controlled language processor of the PENG system. The
controlled language processor generates lookahead information on the fly for
each word form that the user enters while the specification is written. This
look-ahead information consists of syntactic categories which predict what kind
of input can follow the current word form. The lookahead categories are
implemented as hypertext links (see Figure 2).
Figure 2: Lookahead Categories
By clicking on a hypertext link the user can access help
information about an approved word class (see Figure 3).
Figure 3: Word Class Help
The available words of these approved word classes can be
inserted into the text at the current cursor position. The lookahead categories
are active by default but the experienced user might want to switch them off in
the View menu (see Figure 4) or alternatively display them in the message
window.
Figure 4: Lookahead Categories
Instead of typing an approved word form into the text editor,
the user can alternatively select a word form from a context menu. The user can
launch the context menu by clicking on the left mouse button and select a
suitable word form from a hierarchical menu (see Figure 5).
Figure 5: Context Menu
Please note that the same lookahead categories (in our case: proper
noun, determiner, cardinal, connective) are available
in the context menu as in the text window where they are displayed as hypertext
links (see Figure 2 for details). Once an approved word form has been selected
from the context menu, it will be inserted automatically into the text window
and text processing is immediately resumed (see Figure 6).
Figure 6: Inserting a Word Form via the
Context Menu
Not only can approved word forms be selected and inserted in
this way, but also all noun phrases which are accessible in the specification
text. Accessible noun phrases are collected during parsing and then displayed
in the context menu (see Figure 7).
Figure 7: Accessible Noun Phrases
These noun phrases can be inserted in a similar way as
approved word forms (see Figure 8).
Figure 8: Inserting an Accessible Noun Phrase
The user can display various kinds of
output information about the processed text in the message window. Using the
View menu, the user can display a paraphrase which indicates anaphoric references
and synonyms, a syntax tree for each sentence, the discourse representation
structure for the emerging specification text and the equivalent first order
logic representation. Additionally, the output (i.e. the entire proof or
model) of the reasoning engine can be displayed as well as the result of the
reasoning process (for example the answer to a question). See Figure 9 for an
overview of the available options:
Figure 9: View Options
Figure 10 shows the output in the message window for the
simple sentence The secretary works on Monday with all the
above-mentioned view options selected.
Figure 10: Message Window
The PENG system features two reasoning services (parts of
these reasoning services are still under development). In the standard mode
question answering, consistency checking and informativity checking is fully
implemented for Otter/Mace and question answering and consistency checking is
implemented for Satchmo. Please note that informativity checking is not
yet implemented for Satchmo.
Figure 11 shows that automatic consistency checking is always
selected by default. That means that after each sentence which the user enters
a consistency check is executed. The reasoning service checks whether the
entire specification is consistent or not. This is a costly task, since the
system creates a new model from scratch each time a new sentence is added to
the specification text. Automatic consistency checking can be switched off and
can be executed manually by the user – whenever required. The user can select
the preferred reasoning service via the Tools menu.
Figure 11: Reasoning Service
The best way to get a first idea about the coverage of the
controlled natural language PENG is to have a closer look at the corpus of test
sentences and questions which is available via the Tools menu. These sentences
are classified into the following subcategories (see Figure 12).
Figure 12: Test Sentences
If the user chooses, for example, the subcategory
"Consistency", the following set of sentences is displayed (see
Figure 13). Once the user selects a sentence, the sentence is automatically
copied into the text window and processed (depending on the selected option of
the reasoning service).
Figure 13: Test Sentences (Consistency)
Part of the text editor is a lexical editor for adding userspecific
content words. If the author enters a content word (i.e. proper noun, common
noun, verb, adjective or adverb) into the text editor
which is not yet available in the lexicon and is not in the list of illegal
words, then this content word needs to be added to the user lexicon of the PENG
system. The lexicon editor is accessible
via the Tools menu (see Figure 14).
Figure 14: Calling the Lexical Editor
The interface to the lexical editor has been designed in such
a way that only minimal linguistic knowledge is required by the user to add a
new content word to the lexicon (see Figure 15). As soon as a new content word
is available in the lexicon, the parsing process is resumed. User-defined content
words can also be deleted from the user lexicon. However, the user cannot
delete words in the base lexicon of the PENG system which contains (mainly) the
most frequent 3000 content words of English as well as all predefined function
words.
Figure 15: Lexical Editor
Please note that if the user enters an unknown word into the
text editor, then the system tries to apply a number of simple spelling
heuristics and makes suggestions for spelling correction (see Figure 16):
Figure 16: Spelling Errors
Note
that it is up to the user to decide whether a word from is misspelled or not
and to correct it - if necessary - or to add it to the user lexicon.
2.2 The Web Feed Mode (Work in Progress)
When the user selects the web feed mode, the text editor asks
if the current user lexicon should be used for the new task or if a new user
lexicon should be created. Once selected, the text editor displays the
interface of the web feed mode (Figure 17).
Figure 17: Text Editor in Web Feed Mode
This interface has a tabbed pane containing an ontology pane
for the specification of the ontological (background) knowledge about a website
and one or more summary panes for the description of those individual web pages
which are part of the website. Below the tabbed pane there is a message window
for the system feedback and above the tabbed pane is a question window for
asking questions about various aspects of a web feed specification.
The ontology pane contains a title field for the name of the
web feed, a link field for the URL to the website that corresponds to the
channel, and an additional lexicon field for the URL that points to the
(exported) user lexicon of the controlled natural language. For example, the
following conditional sentence expresses domain-specific ontological knowledge
about a website and can be placed into the description window of the ontology
pane:
If X
works then X succeeds.
Each summary pane contains a title field for the name of the
web page, a link field for the URL which points to the original web page and a
description window for the summary of a web page. Let us illustrate the use of
the web feed mode with the help of a very simple setting consisting of an
ontology pane and two summary panes. The ontology pane contains background information
as the above-mentioned sentence, the first summary window contains - among other
information - the sentence
Bill Smith works at
and the second
window contains the sentence
Mary
works at the DSTO.
The
question
Who
succeeds?
will result in two proofs and two answers will be extracted
from this proof as Figure 18 illustrates.
Figure 18: Question Answering in the Web Feed
Mode
Please note that the web feed mode is work in progress
and question answering and consistency checking is currently only partially
implemented in this mode for Otter/Mace and Satchmo. Informativity
checking is currently not available in this mode.