The capability of in-context learning (ICL) enables large language models (LLMs) to perform novel tasks without parameter updates by conditioning on a few input-output examples. However, collecting high-quality examples for new or challenging tasks can be costly and labor-intensive. In this work, we propose a cost-efficient two-stage pipeline that reduces reliance on LLMs for data labeling. Our approach first leverages readily available cross-task examples to prompt an LLM and pseudo-label a small set of target task instances. We then introduce a graph-based label propagation method that spreads label information to the remaining target examples without additional LLM queries. The resulting fully pseudo-labeled dataset is used to construct in-task demonstrations for ICL. This pipeline combines the flexibility of cross-task supervision with the scalability of LLM-free propagation. Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
翻译:上下文学习(ICL)的能力使大型语言模型(LLMs)能够通过基于少量输入输出示例进行条件化,无需参数更新即可执行新任务。然而,为新任务或具有挑战性的任务收集高质量示例可能成本高昂且劳动密集。在这项工作中,我们提出了一种成本效益高的两阶段流程,减少了对LLMs进行数据标注的依赖。我们的方法首先利用易于获取的跨任务示例来提示LLM,并对一小部分目标任务实例进行伪标注。随后,我们引入了一种基于图的标签传播方法,将标签信息传播到剩余的目标示例中,无需额外的LLM查询。由此产生的完全伪标注数据集用于构建ICL的任务内演示。该流程结合了跨任务监督的灵活性和无需LLM传播的可扩展性。在五个任务上的实验表明,我们的方法在降低标注成本的同时实现了强劲的性能。