Retrieval-augmented generation (RAG) plays a critical role in user-generated content (UGC) platforms, but its effectiveness critically depends on accurate query-document relevance assessment. Despite recent advances in applying large language models (LLMs) to relevance modeling, UGC platforms present unique challenges: 1) ambiguous user intent due to sparse user feedback in RAG scenarios, and 2) asymmetric relevance, where relevance is driven by localized answer-bearing content rather than global query-document similarity. To address these issues, we propose the Reinforced Reasoning model for Relevance Assessment (R3A), which decomposes relevance assessment into intent inference and evidence grounding. R3A leverages auxiliary high-clicked documents to infer latent query intent, and extracts verbatim evidence fragments to ground relevance decisions, reducing noise sensitivity and improving asymmetric relevance modeling. Experimental results demonstrate that R3A substantially outperforms strong baselines on offline benchmarks, while the distilled R3A-1.5B model achieves significant gains in large-scale online A/B testing, effectively balancing performance and practical deployability.
翻译:检索增强生成技术(RAG)在用户生成内容(UGC)平台中发挥着关键作用,但其有效性高度依赖于查询与文档间精确的相关性评估。尽管近年来基于大语言模型(LLM)的相关性建模取得了显著进展,但UGC平台仍面临独特挑战:1)在检索增强生成场景中,用户反馈稀疏导致查询意图模糊;2)非对称相关性——即相关性由局部载有答案的内容驱动,而非全局的查询-文档相似性。为解决上述问题,我们提出强化推理相关性评估模型(R3A),该模型将相关性评估分解为意图推断与证据支撑两个环节。R3A利用辅助性高点击文档推断潜在查询意图,并提取逐字证据片段以夯实相关性决策,从而降低噪声敏感性并优化非对称相关性建模。实验结果表明,R3A在离线基准测试中显著优于强基线模型;经知识蒸馏构建的R3A-1.5B模型在大规模在线A/B测试中取得了显著增益,有效平衡了性能与实用性部署需求。