This paper explores a novel technique for improving recall in cross-language information retrieval (CLIR) systems using iterative query refinement grounded in the user's lexical-semantic space. The proposed methodology combines multi-level translation, semantic embedding-based expansion, and user profile-centered augmentation to address the challenge of matching variance between user queries and relevant documents. Through an initial BM25 retrieval, translation into intermediate languages, embedding lookup of similar terms, and iterative re-ranking, the technique aims to expand the scope of potentially relevant results personalized to the individual user. Comparative experiments on news and Twitter datasets demonstrate superior performance over baseline BM25 ranking for the proposed approach across ROUGE metrics. The translation methodology also showed maintained semantic accuracy through the multi-step process. This personalized CLIR framework paves the path for improved context-aware retrieval attentive to the nuances of user language.
翻译:本文探索了一种基于用户词汇-语义空间进行迭代查询优化的新技术,旨在提升跨语言信息检索系统的召回性能。该方法融合多层级翻译、语义嵌入扩展及用户画像增强等技术,以解决用户查询与相关文档之间的匹配差异问题。通过初始BM25检索、中间语言翻译、相似术语嵌入查找及迭代重排序,该技术旨在扩展针对个体用户的个性化潜在相关结果范围。在新闻和Twitter数据集上的对比实验表明,该方法在ROUGE指标上优于基线BM25排序性能。多步骤翻译流程亦验证了语义准确性的维持能力。该个性化跨语言检索框架为改进上下文感知检索提供了新路径,能够精准捕捉用户语言的细微差异。