Innovative Techniques Add Speed, Value to Language Technologies Published Sept. 8, 2011 By Eric Hansen 711th Human Performance Wing WRIGHT-PATTERSON AIR FORCE BASE, Ohio -- Department of Defense personnel operate all over the world, fighting in the Global War on Terror, bringing humanitarian relief, and working with coalition operations. Much of the information needed to effectively operate is in foreign speech and text; however, there is a critical lack of linguists to translate these sources. To address this problem, the DoD has funded the development of translators for both speech and text. Currently, the development process has been slow and costly as the standard method to improve performance has been to collect, transcribe, and translate ever larger amounts of training data. While this involves low technical risk, it will not be able to address rapid turnaround requirements in new languages and domains of interest to the DoD. What's needed are innovative techniques that can perform at least on par with standard techniques but that do not require substantial new training data when being adapted for use in a new language and/or domain. SDL Language Weaver, a Small Business Innovation Research (SBIR) project, produced data-efficient learning techniques that significantly reduce development costs of high-accuracy machine translation systems, particularly for translating from and into morphologically rich, data-poor languages and domains. Morphologically rich languages use a large number of prefixes and/or suffixes that attach to word stems to affect meaning. Many such languages are of importance to the DoD. The technologies developed by SDL Language Weaver consist of the SDL Language Weaver syntax-based machine translation (MT) approach; the construction of better, MT-effective dictionaries; the exploitation of smart, effective morphology analysis; and the integration of large-scale language models. These technologies led to accurate syntax-based machine translation products. Rather than using the phrase-based machine translation approach that relies heavily on lexicalized translational phrase pairs, this project uses the SDL Language Weaver syntax-based approach that translates with syntax translation rules. A translation rule can either capture lexical translation information or encode abstract (or generalized) syntactic translation knowledge. This makes the syntax approach a better fit because it allows domain-specific dictionaries, integration of human linguistic knowledge, and generalization from the smaller amount of training examples. This creates not only creates an MT system that surpasses the performance of traditional phrase-based systems, but does so on morphologically rich languages. By leveraging additional data sources, and developing training techniques specific to handling the challenges of morphology rich languages, SDL Language Weaver can apply this knowledge to additional languages and domains.