In an era of AI-generated misinformation flooding the web, existing tools struggle to empower users with nuanced, transparent assessments of content credibility. They often default to binary (true/false) classifications without contextual justifications, leaving users vulnerable to disinformation. We address this gap by introducing TRACE: Transparent Reliability Assessment with Contextual Explanations, a unified framework that performs two key tasks: (1) it assigns a fine-grained, continuous reliability score (from 0.1 to 1.0) to web content, and (2) it generates a contextual explanation for its assessment. The core of TRACE is the TrueGL-1B model, fine-tuned on a novel, large-scale dataset of over 140,000 articles. This dataset's primary contribution is its annotation with 35 distinct continuous reliability scores, created using a Human-LLM co-creation and data poisoning paradigm. This method overcomes the limitations of binary-labeled datasets by populating the mid-ranges of reliability. In our evaluation, TrueGL-1B consistently outperforms other small-scale LLM baselines and rule-based approaches on key regression metrics, including MAE, RMSE, and R2. The model's high accuracy and interpretable justifications make trustworthy information more accessible. To foster future research, our code and model are made publicly available here: github.com/zade90/TrueGL.
翻译:在人工智能生成的虚假信息充斥网络的时代,现有工具难以向用户提供细致且透明的内容可信度评估。这些工具通常采用二元(真/假)分类,缺乏上下文解释,导致用户易受虚假信息影响。为解决此问题,我们提出TRACE:具备上下文解释的透明可靠性评估框架——一个统一框架,执行两项关键任务:(1)对网络内容分配细粒度、连续性的可靠性评分(0.1至1.0);(2)为其评估生成上下文解释。TRACE的核心是基于超过14万篇论文构成的新型大规模数据集微调的TrueGL-1B模型。该数据集的主要贡献在于其包含35种不同的连续性可靠性评分注释,这些注释采用人类-大语言模型协同创作与数据中毒范式生成。该方法通过填充可靠性评分的中间区间,突破了二元标注数据集的局限。实验评估中,TrueGL-1B在MAE、RMSE与R²等关键回归指标上持续优于其他小规模大语言模型基线及基于规则的方法。模型的高精度与可解释性论证使可信信息更易获取。为促进后续研究,我们的代码与模型已公开于:github.com/zade90/TrueGL。