Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the efficacy of machine-generated annotations, where we compare the results of a fine-tuned BERT model with human v/s machine-generated annotations. Applying our methods to the vanilla GPT-3 model, we saw that machine generated annotations were 78.54% correct and the fine-tuned model achieved a 96.01% model performance compared to the performance with human-labelled annotations. This result shows that machine-generated annotations are a resource and cost effective way to fine-tune down-stream models.
翻译:大型“指令微调”语言模型(即针对指令响应进行微调的模型)展现出了零样本泛化到新任务的卓越能力。然而,这类模型高度依赖人工编写的指令数据,而这些数据在数量、多样性和创造性上往往有限,从而制约了微调模型的通用性。我们通过定量研究来评估机器生成标注的有效性,对比了基于BERT模型在使用人工标注与机器生成标注进行微调后的结果。将我们的方法应用于原始GPT-3模型后,发现机器生成标注的正确率为78.54%,而微调模型相较于使用人工标注所取得的性能达到了96.01%。这一结果表明,机器生成标注是一种在资源和成本上高效的下游模型微调方式。