Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
翻译:近期研究揭示了ChatGPT等大型语言模型在社交计算文本标注任务中的应用潜力。然而,众所周知其性能高度依赖输入提示的质量。为此,学界涌现了大量关于提示调优的研究——旨在提升提示质量的技术与准则体系。但这些方法大多依赖人工操作和待标注数据集的先验知识。为突破这一局限,我们提出APT-Pipe自动化提示调优流水线,旨在针对任意数据集自动优化提示以提升ChatGPT的文本分类性能。我们在十二个不同文本分类数据集上对APT-Pipe进行实现与测试,实验表明经其调优的提示使ChatGPT在九个数据集上获得更高的加权F1值,平均提升达7.01%。通过展示其可扩展至支持更多调优机制,我们进一步凸显了APT-Pipe作为框架的灵活性。