Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release a cross-disciplinary test set sampled from ACT2. Evaluation with both zero-shot and fine-tuned Large Language Models demonstrates that SOFT enables higher agreement between human annotators and LLMs, and supports stronger classification performance and robust cross-domain generalization compared to ACL-ARC and SciCite annotation frameworks. These results confirm SOFT's value as a clear, reusable annotation standard, improving clarity, consistency, and generalizability for digital libraries and scholarly communication infrastructures. All code and data are publicly available on GitHub https://github.com/zhiyintan/SOFT.
翻译:理解引文的作用对于研究评估和引文感知的数字图书馆至关重要。然而,现有的引文分类框架常常将引用意图(为何引用某工作)与引用内容类型(引用了哪部分内容)混为一谈,由于细粒度类型区分与实际分类可靠性之间的困境,限制了其在自动分类中的有效性。我们引入了SOFT,一个具有两个维度的语义正交框架,该框架明确地将引用意图与引用内容类型分离开来,其灵感来源于语义角色理论。我们使用SOFT系统地重新标注了ACL-ARC数据集,并发布了一个从ACT2中采样的跨学科测试集。使用零样本和微调的大型语言模型进行的评估表明,与ACL-ARC和SciCite标注框架相比,SOFT能够实现人类标注者与LLMs之间更高的一致性,并支持更强的分类性能和稳健的跨领域泛化能力。这些结果证实了SOFT作为一个清晰、可复用的标注标准的价值,为数字图书馆和学术交流基础设施提高了清晰度、一致性和可推广性。所有代码和数据均在GitHub上公开可用:https://github.com/zhiyintan/SOFT。