[Ltg] HCSNet/LTG Seminar: [Martin Volk, Monday, January 15, 2007, 11am, E6A357]
Rolf Schwitter
rolfs at ics.mq.edu.au
Sat Jan 13 11:07:04 EST 2007
HCSNet
The ARC Network in Human Communication Science
Seminar: Martin Volk, Stockholm University
DetailsTitle: Cross-Language Information Retrieval, Parallel Treebanks and
Machine Translation
Speaker: Martin Volk, Stockholm University
Locations and times
This seminar will take place at the following locations and dates:
a.. Sydney Date: 15 January 2007 at 11am
Location: Building E6A, Room 357, Macquarie University, North Ryde, Sydney
Contact: Rolf Schwitter, rolfs at ics.mq.edu.au
b.. Brisbane Date: 22 January 2007 at 2pm
Location: Queensland University of Technology, Gardens Point Campus, Room
S524
Contact: James Hogan, j.hogan at qut.edu.au
c.. Melbourne Date: 29 January 2007, 2pm-3:30pm
Location: University of Melbourne, Alan Gilbert Theatre 1
Contact: Tim Baldwin, tim at csse.unimelb.edu.au
Summary
This presentation deals with different aspects of multilingual language
technology. We start by summarizing our work in a project on Cross-Language
Information Retrieval in the Medical Domain. In this project we have
evaluated different means of bridging the gap between German queries and
English documents and vice versa. We worked with a parallel collection of
medical abstracts in the two languages.
The combined research on such parallel corpora and on treebanks has recently
led to parallel treebanks. A parallel treebank consists of syntactically
annotated sentences in two or more languages, taken from translated
documents. In addition, the syntax trees of two corresponding sentences are
aligned on a sub-sentential level (word and phrase level). Parallel
treebanks can be used as training or evaluation corpora for word and phrase
alignment, as input for example-based machine translation, as training
corpora for transfer rules, or for translation studies.
We are developing a German-English-Swedish parallel treebank, with texts
from financial documents and from a novel. We will report on our methods and
tools for building the monolingual treebanks in the three languages and for
aligning the corresponding units on the word and phrase level.
In a related project the Computational Linguistics Group at Stockholm
University has joined forces with a leading subtitling company in building a
system for the automatic translation of film subtitles from Swedish to
Danish. The company has provided a wealth of already translated subtitles,
and our group builds a translation system to re-use and re-assemble the
previous translations at various levels of granularity.
A first prototype has been built and produces good results. The output will
be checked by a professional translator, but it is expected that at least a
third of the automatically translated subtitles need not be touched. We will
report on experiences with handling the large parallel corpus and the
current status of the project.
Bio
Martin Volk has received his PhD from the University of Koblenz (Germany) in
1994. He has subsequently worked in Switzerland at the University of Zurich,
the Zurich University of Applied Sciences, and at Eurospider Information
Technology AG. Since 2003 he has been a professor of Computational
Linguistics at Stockholm University (Sweden). His main research interests
are in multilingual corpus annotation, cross-language information retrieval
and machine translation.
More information about the LTG
mailing list