To translate well, machine translation (MT) systems and general-purposed language models (LMs) need a deep understanding of both source and target languages and cultures. Therefore, idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems, as literal translations often miss the intended meaning. Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context awareness. Addressing these challenges, our approach prioritizes context awareness and scalability, allowing for offline storage of idioms in a manageable KB size. This ensures efficient serving with smaller models and provides a more comprehensive understanding of idiomatic expressions. We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B), by retrieving idioms' figurative meanings. We present a novel, GPT-4-powered metric for human-aligned evaluation, demonstrating that IdiomKB considerably boosts model performance. Human evaluations further validate our KB's quality.
翻译:为了高质量翻译,机器翻译(MT)系统和通用语言模型(LM)需要深刻理解源语言和目标语言及其文化。因此,习语因其非组合性特质,对基于Transformer的系统构成了特殊挑战——字面翻译往往无法传达其原本含义。传统方法使用现有知识库(KB)替换习语,但常缺乏规模性和语境感知能力。针对这些挑战,我们的方法优先考虑语境感知与可扩展性,能够以可管理的知识库规模离线存储习语,从而确保使用较小模型实现高效服务,并提供对习语表达的更全面理解。我们介绍了一种利用大型语言模型开发的多语言习语知识库(IdiomKB)。该知识库通过检索习语的比喻含义,帮助较小模型(如BLOOMZ 7.1B、Alpaca 7B和InstructGPT 6.7B)实现更优翻译。我们提出一种基于GPT-4的新型评估指标,用于实现与人类判断对齐的评估,证明IdiomKB能显著提升模型性能。人类评估进一步验证了我们知识库的质量。