Home PhD Research Short Bio Publications Downloads More About Me

How to install
Download and save the Bleu.cpp into a suitable direction.
Invoke the compiler: (for example in a linux enviroment)
g++ -O3 FMeasure.cpp -o FMeasure


NAME
FMeasure - The FMeasure metric for evaluating (machine) translation quality on precision and recall

SYNOPSIS
FMeasure [options] <Candidate file> <Reference file 1> [Reference file 2..n]

DESCRIPTION
FMeasure calculates the FMeasure score over a candidate file and at least one reference file.
The score is the f-measure score over a calculated precision and recall score. The precision and recall scores are based on maximum matching sets of tokens between the candidate file and the different reference files.

The advantage of the popular Bleu is that there is both a precision and a recall value. The arbitrary choice for the n-gram length is nicely solved by taking maximum matching sets. However this introduces another abitrary setting, the setting for the exponent for calculating maximum matching sets. Usually a square value produces good comparable results.
For more information read the Bleu paper published by IBM on the Bleu metric.

OPTIONS
-h Show a short help message and quits
-vTurn on verbose mode
-cTurn on case sensitivity
Bleu as defined by IBM treats tokens as equal when they match ignore the case of the tokens. This is useful when a word in the candidate/reference text is at the sentence start but not in its counter part.
However in some languages cases provide general (linguistic) information and attention to the cases is desired
Use this switch to turn on case sensitivity, default is case insensitive
-eMust be followed by a postive nonzero integer
Set the exponent for calculating maximum matching sets in the following formula:
MMS=(run1e+run2e+ .. + runne)1/e
This value controls the offtrade between longer runs and more shorter runs, for e=1 2 runs of length 1 next to each other count the same as 1 run of length 2. For e=2 the latter option is prefered. This is usually a better option because it favours longer co-occuring texts in both the candidate and the reference text(s)
Default value for this is e=2
-iIgnore first token in every sentence
Sometimes the first token is used for sentence numbering and for the purpose of calculating the Bleu score, the first token must be ignored.
This switch discards the first token on every sentence both in the candidate and in the reference file(s).


File format
The program expects both the candidate file and the reference file in the same file format.
In this file format the kleene star on an otherwise empty line is used as a sentence delimiter.
A file might look like this:
First sample line of our toy corpus
*
Second test line
*
Tip
A well know format otherwise used its the .sgml.ref format, an xml like markup.
To convert from this format to a format accepted by this program is by use grep and sed:

cat file.sgml.ref | grep -v "<refset" | grep -v "<doc" | grep -v "^</" | sed -e "s/<seg>//" | sed -e "s/<\/seg>/\n*/"