Text classification is vital for Web for Good applications like hate speech and misinformation detection. However, traditional models (e.g., BERT) often fail in dynamic few-shot settings where labeled data are scarce, and target labels frequently evolve. While Large Language Models (LLMs) show promise in few-shot settings, their performance is often hindered by increased input size in dynamic evolving scenarios. To address these issues, we propose GORAG, a Graph-based Online Retrieval-Augmented Generation framework for dynamic few-shot text classification. GORAG constructs and maintains a weighted graph of keywords and text labels, representing their correlations as edges. To model these correlations, GORAG employs an edge weighting mechanism to prioritize the importance and reliability of extracted information and dynamically retrieves relevant context using a tailored minimum-cost spanning tree for each input. Empirical evaluations show GORAG outperforms existing approaches by providing more comprehensive and precise contextual information. Our code is released at: https://github.com/Wyb0627/GORAG.
翻译:文本分类对于网络公益应用(如仇恨言论和虚假信息检测)至关重要。然而,传统模型(例如BERT)在标注数据稀缺且目标标签频繁演变的动态少样本场景中往往表现不佳。尽管大语言模型在少样本场景中展现出潜力,但其性能在动态演化情境中常因输入规模增加而受限。为解决这些问题,我们提出了GORAG,一种基于图的在线检索增强生成框架,用于动态少样本文本分类。GORAG构建并维护一个关键词与文本标签的加权图,将其关联关系表示为边。为建模这些关联,GORAG采用边权重机制来优先考虑提取信息的重要性和可靠性,并针对每个输入使用定制的最小生成树动态检索相关上下文。实证评估表明,GORAG通过提供更全面和精确的上下文信息,优于现有方法。我们的代码发布于:https://github.com/Wyb0627/GORAG。