Development of rules that support post-editing
During the past few months, ACCEPT partners Acrolinx and the University of Geneva (UNIGE) have started to develop Acrolinx rules that support users with “post-editing”, correcting errors in machine-translated texts. We have started to follow two different approaches to derive such rules, but we plan to combine them later into a common post-editing solution.
In the work led by UNIGE, post-editing tasks were given to subjects who were asked to manually correct errors in English forum posts that had been translated from French with the SMT system used in ACCEPT. From this editing data, we automatically extracted and grouped the most frequently corrected words or phrases and their surrounding sentence contexts. A first manual inspection already showed some frequent problematic output patterns that the subjects corrected in a consistent way. For example, English translations that started with “I have not the ...” were always corrected to “I do not have the ...” Another common source of corrections is French clitics (personal pronouns), whose usage and position differ considerably and consistently from the English target language. These instances provide good candidates for monolingual post-editing rules, which help users to correct the problematic phrases more quickly.
In the approach followed by Acrolinx, we examined English forum posts that were machine-translated into German. We observed that verbs tend to be missing in translations of subordinate clauses and modal verb constructions due to the differences in German syntax. Following these observations, we developed pairs of Acrolinx rules that detect parallel linguistic constructions in both languages. For example, we created rules to detect finite verbs in subordinate clauses in both English and German. For every pattern found on the source side, we expect the corresponding pattern to appear on the target side. Likewise, we checked that terms have been translated according to the existing bilingual terminology. Initial results show that the approach is indeed suitable for verifying the MT output (translation QA). An identified rule mismatch indicates a possibly erroneous phrase. More importantly, we are able to explain to the post-editor which linguistic construction or term was expected at that point. We plan to use the linguistic pattern match information as a novel non-shallow feature for automatically estimating the quality of MT output.