The rapid adoption of retrieval-augmented generation (RAG) systems has revolutionized large-scale content generation but has also highlighted the challenge of ensuring trustworthiness in retrieved information. This paper introduces ClaimTrust, a propagation-based trust scoring framework that dynamically evaluates the reliability of documents in a RAG system. Using a modified PageRank-inspired algorithm, ClaimTrust propagates trust scores across documents based on relationships derived from extracted factual claims. We preprocess and analyze 814 political news articles from Kaggle's Fake News Detection Dataset to extract 2,173 unique claims and classify 965 meaningful relationships (supporting or contradicting). By representing the dataset as a document graph, ClaimTrust iteratively updates trust scores until convergence, effectively differentiating trustworthy articles from unreliable ones. Our methodology, which leverages embedding-based filtering for efficient claim comparison and relationship classification, achieves a 11.2% of significant connections while maintaining computational scalability. Experimental results demonstrate that ClaimTrust successfully assigns higher trust scores to verified documents while penalizing those containing false information. Future directions include fine-tuned claim extract and compare (Li et al., 2022), parameter optimization, enhanced language model utilization, and robust evaluation metrics to generalize the framework across diverse datasets and domains.
翻译:检索增强生成(RAG)系统的快速普及虽然革新了大规模内容生成,但也凸显了确保检索信息可信度的挑战。本文提出ClaimTrust,一种基于传播的信任评分框架,用于动态评估RAG系统中文档的可靠性。该框架采用改进的PageRank启发算法,基于从提取的事实主张中推导出的关系,在文档间传播信任分数。我们对Kaggle假新闻检测数据集中的814篇政治新闻文章进行预处理和分析,提取了2,173条独立主张,并分类出965组有意义的关系(支持或矛盾)。通过将数据集表示为文档图,ClaimTrust迭代更新信任分数直至收敛,从而有效区分可信文章与不可靠文章。我们的方法利用基于嵌入的过滤实现高效的主张比对与关系分类,在保持计算可扩展性的同时实现了11.2%的显著关联率。实验结果表明,ClaimTrust能成功为经过验证的文档分配更高信任分数,同时对包含虚假信息的文档进行降权。未来研究方向包括:基于微调的主张提取与比对(Li等人,2022)、参数优化、增强的语言模型利用,以及构建鲁棒的评估指标以将该框架推广至多样化数据集与领域。