Introduction
I'm interested in how and if machines
are capable of handling natural languages like we humans do. Therefore I'm a student in the
Centre for Language Technology.
Because, unfortunately, I cannot study everything in this area I limited my self to the art of machine translation for the moment. It's amazing and interesting how bad current machine
translations are. If you're bilingual and not familiar with machine translations just visit an online machine translator on the Internet and try to translate a random text, for example
you could try
InterTran. As can be seen still there is a lot of room for research and improvements.
Quick history of machine translation
Machine translation started when the
British Intelligence broke German Enigma codes and when the U.S. military broke the Japanese Type A codes in the 40's of the 20
e century. This code breaking made the people think that translating might be just the same as
decoding one language into another language. Famous is Warren Weaver who wrote:
I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in
some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.
This view on machine translation was reason for a decade of optimism in this field. There were predictions that within a couple of years a fully capable machine translator would be operational.
However Language turned out to be much more difficult, in 1966 the ALPAC (Automatic Language Processing Advisory Committee) concluded in a
report
that Machine Translation was slower, less accurate and twice as expensive as human translation. This report had a great impact for research in the Machine Translation area, and I more or less stopped for a decade.
The
Meteo system system developed in 1977 however seemed to be a good success and slowly the intensity for machine translation started to raise again.
In the 1980s the interest for Machine Translation arose widely around the world. In the late 1980s/early 1990s IBM developed the
Candide System. This was one of the first
approached to set statistical translation back on the map. Statistical Language Processing received much scepticism after the 1940s and 1950s.
There was also a shift of interest between full operational Machine Translating Systems to human assisting systems. At the moment Machine Translation is big research area again and is financially interesting. Speedups of
two times
are claimed. Still nowadays Machine Translation is very poor comparing to human translation. Fortunately for me, there is room for a lot of interesting research.
Money and activity for Machine Translation also seems to have a relation with the thread for enemies. Because knowing your enemy not only gives you the advantage in a war, but also can prevent a war, money is spent for
Machine Translation as a measure to understand other people, each time when there is a thread. There also seems to be a correlation with the languages for machine translations and the languages of the potential enemy.
In the cold war a lot of money was spent for Russian-English research, will the focus now is to other languages, like English-Arabic and English-Chinese.
If you are interested in the history of machine translation read "
Machine Translation: past, present, future".
Previous experience with Machine Translation
I believe that a good approach for machine translation should both look at the (lexical) meaning of
sentences and to the syntax of the sentences. I've done my masters (equivalent) at the
University of Twente at the department for
language technology and interaction. In this master project I tried to combine the Case Based Reasoning approach for machine translation. I used Head Driven Phrase Grammars to generate the cases for
Case based reasoning. Because the dependency trees were stored hierarchically and were linked to the translated trees, utterances similar to previous seen utterances could be translate quite nicely. The disadvantage
of this memory based translation algorithm is that you need a lot of memory and therefore only a small domain could be reached. For more information you can go to the publication section and check out my master thesis.
Evaluation
Finally each theory in science needs evaluation, to check the theories involved. This evaluation is very difficult
for natural language processing applications in computer science. I dedicated a part of this
website to the problem of
Machine Translation Evaluation
My PhD Project
My first supervisor for my PhD Project is
Mark Dras and
Robert Dale.
The key focus points of my PhD are how we can make use of structural knowledge in statistical machine translation.
Kevin Knight has done some work in this area
There is an ongoing debate whether the availability of data about the syntax of language is indeed needed for a statistical machine translation algorithm and if this will benefit it, or if will actually make it worse. The top ranking statistical machine translation algorithms nowadays do not use any syntactically information, but there is evidence that adding structure does indeed help.
In my research project I will be using structural information on only one side of the language pair. Tools which provide structural data are usually only available for
major languages, while the majority of the languages do not have these tools. However when translating from (or to) a major language we might still benefit from the syntactical data we can get on that side.
Because most tools for natural language still are not flawless the issue of error handling needs to be addressed. As part of my PhD I will be looking how syntactical data can be used on a local level when overall data is not correct but individual local pieces of data are.
Some grammar formalisms are better at handling and generating partial data.