Large language models frequently generate plausible but unfaithful summaries that users cannot verify against source text, a critical limitation in compliance-sensitive domains such as government and legal analysis. We present sui-1, a 24B parameter model that produces abstractive summaries with inline citations, enabling users to trace each claim to its source sentence. Our synthetic data pipeline combines chain-of-thought prompting with multi-stage verification, generating over 22,000 high-quality training examples across five languages from diverse sources including parliamentary documents, web text, and Wikipedia. Evaluation shows sui-1 significantly outperforms all tested open-weight baselines, including models with 3x more parameters. These results demonstrate that task-specific training substantially outperforms scale alone for citation-grounded summarization. Model weights and an interactive demo are publicly available.
翻译:大型语言模型经常生成看似合理但无法忠实于原文的摘要,用户无法对照源文本进行验证,这在政府和法律分析等合规敏感领域是一个关键限制。我们提出了sui-1,一个拥有240亿参数的模型,能够生成带有文中引用的抽象摘要,使用户能够将每个主张追溯到其源句子。我们的合成数据流水线将思维链提示与多阶段验证相结合,从议会文件、网络文本和维基百科等多种来源中,生成了超过22,000个涵盖五种语言的高质量训练示例。评估表明,sui-1显著优于所有测试的开源权重基线模型,包括参数量为其三倍的模型。这些结果表明,对于基于引用的摘要任务,针对特定任务的训练显著优于单纯扩大模型规模。模型权重及交互式演示已公开提供。