- The first major goal of this project is therefore to develop a user-friendly (“minimally intrusive”) strategy for pre-editing User-Generated Content for Statistical Machine Translation (SMT). This involves identifying and evaluating the minimal critical linguistic phenomena for improving translatability and implementing these phenomena as rules.
- The second major goal is to address the post-editing bottleneck. The project will work to identify the key factors which can make it possible for the post-editing process to be carried out by people with no knowledge of the source language - this is known as monolingual post-editing. Furthermore we will develop linguistic software to support the monolingual post-editing process.
- The third major goal is to improve learning and develop feedback loops to improve SMT results for community data. SMT systems can learn well from large amounts of similar content in a single domain. Some work is needed to improve their performance in areas where parallel data is relatively sparse and the content is broader in domain - or may be completely out-of-domain. In addition to working on improving learning strategies for SMT in this kind of scenario, we will also develop methods for learning from human post-editing.
- The fourth major goal is to improve reliability of forum content translation using Text Analytics. We will develop strategies and implement linguistic software to deliver useful insights into content. This component will include developing existing text classification algorithms for this purpose as well as rules for identifying sentiment and other relevant phenomena in informal forum content.
Concept and objectives (154 Kb, )