Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
翻译:近期研究突显了ChatGPT等大型语言模型应用在社会计算文本标注中的潜力。然而,众所周知,其性能高度依赖于输入提示的质量。为此,学界涌现了大量关于提示微调的研究——旨在提升提示质量的技术与指南。但这些方法往往依赖人工操作及对标注数据集的先验知识。为突破这一局限,我们提出APT-Pipe,一种自动提示微调流水线。APT-Pipe旨在针对任意数据集自动优化提示,提升ChatGPT的文本分类性能。我们在十二个不同文本分类数据集上实现并测试了APT-Pipe。实验表明,经APT-Pipe调优后的提示使ChatGPT在十二个数据集中的九个上获得更高加权F1分数,平均提升7.01%。此外,通过展示该框架可扩展支持更多微调机制,我们进一步凸显了APT-Pipe的灵活性。