In financial Retrieval-Augmented Generation (RAG) systems, models frequently rely on retrieved documents to generate accurate responses due to the time-sensitive nature of the financial domain. While retrieved documents help address knowledge gaps, model-generated responses still suffer from hallucinations that contradict the retrieved information. To mitigate this inconsistency, we propose a Reinforcement Learning framework enhanced with Fine-grained Knowledge Verification (RLFKV). Our method decomposes financial responses into atomic knowledge units and assesses the correctness of each unit to compute the fine-grained faithful reward. This reward offers more precise optimization signals, thereby improving alignment with the retrieved documents. Additionally, to prevent reward hacking (e.g., overly concise replies), we incorporate an informativeness reward that encourages the policy model to retain at least as many knowledge units as the base model. Experiments conducted on the public Financial Data Description (FDD) task and our newly proposed FDD-ANT dataset demonstrate consistent improvements, confirming the effectiveness of our approach.
翻译:在金融检索增强生成(RAG)系统中,由于金融领域信息的时效性,模型通常依赖检索到的文档来生成准确回答。尽管检索文档有助于弥补知识缺口,但模型生成的回答仍会出现与检索信息相矛盾的幻觉。为缓解这种不一致性,我们提出一种通过细粒度知识验证增强的强化学习框架(RLFKV)。该方法将金融回答分解为原子知识单元,并评估每个单元的正确性以计算细粒度忠实度奖励。该奖励提供了更精确的优化信号,从而提升了与检索文档的一致性。此外,为防止奖励欺骗行为(例如过度简化的回复),我们引入了信息量奖励,以激励策略模型至少保留与基线模型同等数量的知识单元。在公开的金融数据描述(FDD)任务及我们新提出的FDD-ANT数据集上进行的实验均显示出稳定的性能提升,证实了该方法的有效性。