Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve.
翻译:先前研究表明,跨语言信息检索的性能与机器翻译质量密切相关。然而,可能存在一个阈值,超过该阈值后,提升查询翻译质量对检索性能的进一步改善作用甚微甚至毫无益处。该阈值可能受多种因素影响,包括源语言与目标语言、现有机器翻译系统质量以及搜索管道。为了明确在特定搜索管道中改进机器翻译系统的收益,我们利用实际流量收集的实验数据集,探究检索质量对不同机器翻译质量水平的敏感性。我们系统性地提升各语言对机器翻译系统的性能(以BLEU和ChrF等机器翻译评估指标衡量),进而确定其对搜索精确度指标的影响,并提取有助于指导改进策略的信号。基于这些信息,我们开发了多语言对查询翻译的比较技术,以识别最具投资和改进潜力的语言对。