When solving a task with limited labelled data, researchers can either use a general large language model without further update, or use the few examples to tune a specialised smaller model. When enough labels are available, the specialised models outperform the general ones on many NLP tasks. In this work, we aim to investigate how many labelled samples are required for the specialised models to achieve this superior performance, while taking the results variance into consideration. Observing the behaviour of prompting, in-context learning, fine-tuning and instruction-tuning, identifying their break-even points when increasing number of labelled training samples across three tasks of varying complexity, we find that the specialised models often need only few samples ($100-1000$) to be on par or better than the general ones. At the same time, the amount of required labelled data strongly depends on the task complexity and results variance.
翻译:当使用有限的标注数据解决任务时,研究者可以选择直接使用通用大语言模型而不进行进一步更新,或者利用少量样本微调专门的小型模型。当拥有足够标注时,专门模型在许多自然语言处理任务上优于通用模型。本研究旨在考察专门模型需要多少标注样本才能达到这种优越性能,同时考虑结果方差。通过观察提示、上下文学习、微调和指令微调的行为,识别它们在三个不同复杂度任务中随着标注训练样本增加而达到的盈亏平衡点,我们发现专门模型往往仅需少量样本($100-1000$)即可与通用模型持平或更好。同时,所需标注数据量强烈依赖于任务复杂度和结果方差。