Large language models (LLMs) and high-capacity encoders have advanced zero and few-shot classification, but their inference cost and latency limit practical deployment. We propose training lightweight text classifiers using dynamically generated supervision from an LLM. Our method employs an iterative, agentic loop in which the LLM curates training data, analyzes model successes and failures, and synthesizes targeted examples to address observed errors. This closed-loop generation and evaluation process progressively improves data quality and adapts it to the downstream classifier and task. Across four widely used benchmarks, our approach consistently outperforms standard zero and few-shot baselines. These results indicate that LLMs can serve effectively as data curators, enabling accurate and efficient classification without the operational cost of large-model deployment.
翻译:大型语言模型(LLM)和高容量编码器推动了零样本和少样本分类的发展,但其推理成本和延迟限制了实际部署。我们提出利用LLM动态生成的监督信号来训练轻量级文本分类器。我们的方法采用一种迭代的、智能体驱动的闭环流程:LLM负责策划训练数据,分析模型的成功与失败案例,并合成针对性样本来解决观察到的错误。这种闭环的生成与评估过程逐步提升数据质量,并使其适应下游分类器与具体任务。在四个广泛使用的基准测试中,我们的方法持续优于标准的零样本和少样本基线。这些结果表明,LLM能够有效地充当数据策划者,在无需承担大型模型部署的运营成本的前提下,实现准确且高效的分类。