[Ltg] HCSNet/LTG Seminar: [Martin Volk, Monday, January 15, 2007, 11am, E6A357]

Rolf Schwitter rolfs at ics.mq.edu.au
Sat Jan 13 11:07:04 EST 2007


HCSNet
The ARC Network in Human Communication Science
Seminar: Martin Volk, Stockholm University
DetailsTitle: Cross-Language Information Retrieval, Parallel Treebanks and 
Machine Translation
Speaker: Martin Volk, Stockholm University

Locations and times
This seminar will take place at the following locations and dates:
  a.. Sydney Date: 15 January 2007 at 11am
  Location: Building E6A, Room 357, Macquarie University, North Ryde, Sydney
  Contact: Rolf Schwitter, rolfs at ics.mq.edu.au

  b.. Brisbane Date: 22 January 2007 at 2pm
  Location: Queensland University of Technology, Gardens Point Campus, Room 
S524
  Contact: James Hogan, j.hogan at qut.edu.au

  c.. Melbourne Date: 29 January 2007, 2pm-3:30pm
  Location: University of Melbourne, Alan Gilbert Theatre 1
  Contact: Tim Baldwin, tim at csse.unimelb.edu.au

Summary
This presentation deals with different aspects of multilingual language 
technology. We start by summarizing our work in a project on Cross-Language 
Information Retrieval in the Medical Domain. In this project we have 
evaluated different means of bridging the gap between German queries and 
English documents and vice versa. We worked with a parallel collection of 
medical abstracts in the two languages.

The combined research on such parallel corpora and on treebanks has recently 
led to parallel treebanks. A parallel treebank consists of syntactically 
annotated sentences in two or more languages, taken from translated 
documents. In addition, the syntax trees of two corresponding sentences are 
aligned on a sub-sentential level (word and phrase level). Parallel 
treebanks can be used as training or evaluation corpora for word and phrase 
alignment, as input for example-based machine translation, as training 
corpora for transfer rules, or for translation studies.

We are developing a German-English-Swedish parallel treebank, with texts 
from financial documents and from a novel. We will report on our methods and 
tools for building the monolingual treebanks in the three languages and for 
aligning the corresponding units on the word and phrase level.

In a related project the Computational Linguistics Group at Stockholm 
University has joined forces with a leading subtitling company in building a 
system for the automatic translation of film subtitles from Swedish to 
Danish. The company has provided a wealth of already translated subtitles, 
and our group builds a translation system to re-use and re-assemble the 
previous translations at various levels of granularity.

A first prototype has been built and produces good results. The output will 
be checked by a professional translator, but it is expected that at least a 
third of the automatically translated subtitles need not be touched. We will 
report on experiences with handling the large parallel corpus and the 
current status of the project.

Bio
Martin Volk has received his PhD from the University of Koblenz (Germany) in 
1994. He has subsequently worked in Switzerland at the University of Zurich, 
the Zurich University of Applied Sciences, and at Eurospider Information 
Technology AG. Since 2003 he has been a professor of Computational 
Linguistics at Stockholm University (Sweden). His main research interests 
are in multilingual corpus annotation, cross-language information retrieval 
and machine translation.




More information about the LTG mailing list