The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.
翻译:生成式大语言模型(LLM)越来越多地用于数据增强任务,即对文本样本进行改写(或重新生成)后用于分类器的微调。现有增强方法多利用少样本场景,将样本作为提示词的一部分输入LLM以获得更优的增强效果。然而,样本选择通常采用随机方式,对于其他(更具“信息性”的)样本选择策略的影响尚缺乏系统性研究。本文比较了少样本学习文献中现有的样本选择策略,并探究其在基于LLM的文本增强中的效果。我们通过分布内与分布外分类器性能进行评估。结果表明,尽管某些“信息性”选择策略能提升模型性能(尤其对于分布外数据),但这种情况发生频率较低且性能提升有限。在取得进一步突破之前,随机样本选择仍是增强实践中的可靠默认方案。