ACCEPT reports pre-editing evaluation results

How far has the ACCEPT team got in reaching its objective of improving the automatic translation of user-generated content? An extensive evaluation campaign focused on measuring the impact of content pre-editing on translation quality took place during the past year, involving translators from the Faculty of Translation and Interpreting of the University of Geneva. Its results have been just published online and we summarise below some of the main findings.

One of the main achievement of the ACCEPT project so far is the pre-editing technology for English and French languages, which provides automatically or manually applicable rule sets for correcting the spelling, grammar and style of a text from the user-generated content domain. Consider a sentence like the following:

En tout cas, je vous laisse mieux analyser le probleme remonté, juste pour etre sur que le produit travaille comme il faut.
Automatic translation (provided by the ACCEPT baseline system): In any case, I will leave you to better analyse the problem reported, just to be on that the product is working as it should.

While some improvements in the quality of the source text leave the translation unchanged, as in the case of probleme → problème:

En tout cas, je vous laisse mieux analyser le problème remonté, juste pour etre sur que le produit travaille comme il faut.
Automatic translation:
In any case, I will leave you to better analyse the problem reported, just to be on that the product is working as it should,

other improvements are crucial for understanding the translation (sur → sûr):

En tout cas, je vous laisse mieux analyser le probleme remonté, juste pour etre sûr que le produit travaille comme il faut.
Automatic translation:
In any case, I will leave you to better analyse the problem reported, just to be sure that the product is working as it should.

But in still other cases, an improvement of the source may even have an adverse effect on translation. Moreover, as several changes are applied to the same source, the positive impact of some changes may go hand in hand with the negative impact of other changes. So how can we tell if the pre-editing technology fulfils its promise of improving the translation quality of user-generated content and, ultimately, helps reach the project’s main goal of making it easier for communities to share knowledge effectively across the language barrier?

To answer this question, we conducted a large-scale evaluation campaign in which we elicited human judgements for data consisting of forum posts, in the Norton community scenario, and of sentences, in the Translators without Borders community scenario. The evaluators compared the translation of pre-edited text against the translation of the raw text, with the two shown in random order. Additional feedback included the evaluator’s confidence that their choice was right, and the importance of the difference between the two translations. The evaluation was performed in a variety of settings, for instance, with and without access to the source text; using a finer and a coarser grained rating; for automatic pre-editing only and for full – automatic and manual – pre-editing. The languages investigated were French-English, English-French and English-German.

Consistent results were obtained across languages, in all evaluation settings: according to human ratings, pre-editing helps attain a better translation quality for a high proportion of the data (between 46% and 83% for full pre-editing, depending on the setting). As for automatic pre-editing, its impact – when measured on a subset of the data – was positive in 55% to 64% of the cases. About 15% of the data remains of comparable quality despite the changes, and for the remaining portion of the data (about 25%), the pre-editing of the source text has an adverse effect on translation, with the automatic spelling correction of the user names being the major source of errors.

The manual analysis was supplemented by an automatic evaluation using a selection of metrics including BLEU and METEOR. The results showed only weak correlation with human evaluation results, emphasizing the importance of manual analysis in machine translation evaluation.

Extrinsic evaluation of the ACCEPT technology assessed the impact of pre-editing on the task of post-editing machine translation output. The pre-editing of the source text was found to lead to a substantial improvement of post-editing productivity, reducing the post-editing time by half. The detailed account of the evaluation results can be found in the ACCEPT deliverable D 9.2 Survey of evaluation results.