[Ltg] LTG Seminars [13 + 20 December, E6A, 11am ]

Rolf Schwitter rolfs at ics.mq.edu.au
Wed Dec 1 20:40:10 EST 2004


----
LTG Seminar
 - see: http://www.clt.mq.edu.au/Events/Seminars.html

Monday, December 13, 2004 at 11am
Macquarie Uni, E6A 357
----

Speaker: Menno van Zannen
Title: How to get from A to ABL

If you want to structure sequences (such as natural lanuage sentences), you 
could use for example a grammar and parse the sequences.  However, what do 
you do if such a grammar is not available?  This problem can occur, for 
example, when analyzing sentences written in an ancient natural language, or 
a natural language for which no grammar has been created yet.  To make 
things worse, it may also be a sentence from a ``language'' for which it is 
unclear what the grammar should look like (for example in music).

The Alignment-Based Learning framework is a grammatical inference framework 
(or more precisely a structure inference framework).  By analyzing the 
sentences, it tries to find regularities that may be used to assign 
structure to unstructured sequences.

In this talk, I will explain how the system works, how it arose, how it 
evolved and how it can be used in the future.  This will give a compact 
overview of the previous work I have done and some ideas on future work I 
would like to perform.



Monday, December 20, 2004 at 11am
Macquarie Uni, E6A 357
----

Speaker: Tanja Gaustad van Zannen
Title: Linguistics Knowledge and Word Sense Disambiguation

In this talk I will present the findings of my PhD research. The main 
research question I try to answer in my thesis is which linguistic knowledge 
sources are most useful for word sense disambiguation (WSD), more 
specifically word sense disambiguation of Dutch. Therefore, the structure of 
the thesis - and of this talk - is based on the various levels of linguistic 
information tested for WSD, including morphology, information on the 
syntactic class of a particular ambiguous word, and the syntactic structure 
of the entire sentence containing an ambiguous word.

The goal of the project was to develop a tool which is able to automatically 
determine the meaning of a particular ambiguous word in context, a so called 
word sense disambiguation system.  I will first introduce the experimental 
setup of the WSD system, including a brief presentation of the corpus, the 
classification algorithm used, as well as the "features" (or sources of 
linguistic knowledge) integrated in the model. In a second step, I will 
present a
novel approach to building a statistical WSD system which includes 
morphological information in the form of lemmas as the key element. This 
will be followed by a presentation of various results, highlighting the 
importance of linguistic information in the WSD system presented.

In this statistical WSD system, especially the addition of deep linguistic 
knowledge greatly improves disambiguation accuracy. In combination with an 
approach taking advantage of morphological information, the best results for 
WSD of Dutch (on the Senseval-2 data set) are obtained. My system achieves 
significantly higher disambiguation accuracy than any results for Dutch that 
have been reported in the literature up to now and is thus state-of-the-art 
for Dutch WSD.





More information about the LTG mailing list