Mark Dras's Research Page


A recap from my main page:

In general terms, what I'm interested in is language structure and transformation. I am interested in the ways that natural languages and their mathematical or formal representations can be transformed, and the contexts in which these transformations occur; this encompasses machine translation, paraphrase and models of language change. In the bigger picture, what kind of structure is computationally amenable to transformation provides another view on what the structure of language may be.

More specifically, here's what I'm working on:

PARAPHRASE AND NEAR-SYNONYMY

One of the significant characteristics of language is that there are multiple ways of expressing the same idea. There may be slight differences of emphasis, nuances of meaning, that differentiate one expression of an idea from another, but the importance of this nuancing varies with the context. A paraphrase example is ``The investigators made a distinction between the alleged attackers and their supporters'' versus ``The investigators distinguished between the alleged attackers and their supporters''; a near-synonym one is the difference between ``frugal'' and ``stingy''.

One question of interest is knowing when to apply these paraphrases. Altering, say, a document to fit specifications -- for example, to compress a document by 25% while maintaining the same density of concepts and improving readability -- can be seen as applying paraphrases under a set of constraints. This work has involved characterising this as a mathematical optimisation problem, and also find formal representations for paraphrase.

Another is in characterising and automatically acquiring the differences between near-synonyms. People at the University of Toronto have taken one approach to this; I'm interested in how corpus statistics approaches might work here.

Integrating this notion of multiple means of expression with action in the context of a virtual environment -- where software agents communicate with users in human-like ways -- has led to an ARC Discovery grant (2005-2007) for $362K with Debbie Richards and Manolya Kavakli.

MACHINE TRANSLATION

Here, I'm particular interested in issues of syntax in MT.

One issue is in formal grammar, in constructions where there are structural difficulties in translating between languages. In this, I've been particularly interested in Synchonous Tree Adjoining Grammar (S-TAG); my work here has involved investigating representations necessary for translation among a range of languages, including English, French, Spanish, Korean, and recently Dutch; on constructions that when paired cause difficulties, such as clitics or aspects of Korean word order; and on efficient algorithms for dealing with these representations.

Questions to ask here include:

Another issue is how syntax can work with Statistical MT. Current work involves looking at syntax as preprocessing, as in the ACL2005 work of Collins, Koehn and Kucerova, and seeing what it is about the use of syntactic reordering that leads to an improvement in translation quality.

EVALUATION OF GENERATED TEXT FLUENCY

In evaluating the output of language technology applications -- MT, natural language generation, summarisation -- automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. I'm looking at developing automatic evaluation metrics to estimate fluency alone, by examining the use of parser outputs as metrics, and examining how they correlate with human judgements of generated text fluency; developing machine learners based on these; and examining how these are affected by different language models and different domains.

[Mark's home page]


Last updated 9 April 2007