ITC SIG of STC Article
Evaluation of machine translation

A workshop was held in conjunction with Machine Translation Summit VIII on Saturday, September 22, 2001, in Santiago de Compostela, Spain, to further the work on evaluating machine translation.

Many methods and measures for evaluating machine translation (MT) systems have been developed over the years. The ISLE project, funded jointly by the European Union and the US National Science Foundation, is continuing the work started in the EU's EAGLES project on systematizing these methods and measures. This workshop will be the fourth in a series that report on and expand on the systematization.

The effort focuses on building schemes that classify various aspects important to MT, including user needs, suggested system characteristics, and associated metrics. The classification schemes relate to ISO work on software evaluation. The work is intended to be useful to those who are considering machine translation, interested in comparative evaluation of several MT systems, and developing MT systems. A fuller account of the ISLE evaluation work and an overview of the current classification schemes can be found at www.isi.edu/natural-language/mteval.

By the time of the MT Summit, the ISLE project will have organized or been involved in a number of workshops on MT evaluation, namely at LREC in Athens, AMTA in Mexico, Geneva Workshop, and NAACL in Pittsburgh. Participants are encouraged, but not required, to get involved with the Geneva Workshop or NAACL Workshop as these are hands-on exercises in MT and results of the previous workshops will be presented at the MT Summit workshop. This workshop is a preliminary attempt at reaching conclusions to synthesize results.

Information about the previous workshops can be found at:

LREC: www.icp.grenet.fr/ELRA/lrec2000.html
AMTA: www.isi-eud/natural-language/conferences/AMTA2000.html
NAACL 2001: www.cs.cum.edu/~ref/naccl2001.html
MT Evaluation workshop: www.issco.unige.ch/projects/isle/mt-eval-workshop.html
MT Summit VIII: www.eamt.org/summitVIII

Papers have been invited along the themes discussed above. The questions and issues to be answered are diverse, but preference will be given to papers relating to the ISLE framework.

The following questionssuggest possible evaluation threads within the framework.

What kind of metrics are useful for what system characteristics?

What system characteristics reflect what user needs?

Is there a radical difference between evaluation focusing on research or development needs, and evaluation focusing on end-user needs?

When should real-world data be used, and what is the impact of using it?

What constitutes a valid metric? How can you demonstrate that a metric is valid?

What are the advantages and disadvantages of specific metrics?

What kinds of tools automate the evaluation process? Can the process (or any part of it) be automated? What are the difficulties inherent in choosing particular metrics for automation?

What kinds of tasks are suited to which evaluation schemes?

How can we use the evaluation process to speed or improve the MT development process?

How can we evaluate MT when MT is a small part of the data flow? How independent is MT of the subsequent processing? Cleaning up the data improves performance, but does it improve it enough? How do we quantify that?

For more information please visit: www.eamt.org/summitVIII/guidelines.html. Information provided by Fred Klein; source: TCF-GEN.

 

Copyright © 2008 Society for Technical Communication. Site initially posted May 12, 2008. Items are dated when posted with the month and year. Find new items with Ctrl F; enter the month and year, for example, June 2005.