Hallucination is a major concern in LLM-driven service systems, necessitating explicit knowledge grounding for compliance-guaranteed responses. In this paper, we introduce Retrieval-Augmented Learning-to-Match (RAL2M), a novel framework that eliminates generation hallucination by repositioning LLMs as query-response matching judges within a retrieval-based system, providing a robust alternative to purely generative approaches. To further mitigate judgment hallucination, we propose a query-adaptive latent ensemble strategy that explicitly models heterogeneous model competence and interdependencies among LLMs, deriving a calibrated consensus decision. Extensive experiments on large-scale benchmarks demonstrate that the proposed method effectively leverages the "wisdom of the crowd" and significantly outperforms strong baselines. Finally, we discuss best practices and promising directions for further exploiting latent representations in future work.
翻译:在由大语言模型驱动的服务系统中,幻觉是一个主要问题,需要显式的知识基础来确保响应的合规性。本文提出了检索增强学习匹配(RAL2M),这是一种新颖的框架,通过将大语言模型重新定位为基于检索的系统中的查询-响应匹配判断器,消除了生成幻觉,为纯生成方法提供了一个鲁棒的替代方案。为了进一步减轻判断幻觉,我们提出了一种查询自适应的潜在集成策略,该策略显式地建模了大语言模型之间的异构模型能力及相互依赖关系,从而得出一个经过校准的共识决策。在大规模基准测试上的大量实验表明,所提方法有效利用了"群体智慧",并显著优于强基线方法。最后,我们讨论了在未来的工作中进一步利用潜在表征的最佳实践和有前景的方向。