Large Language Models have recently been applied to text annotation tasks from social sciences, equalling or surpassing the performance of human workers at a fraction of the cost. However, no inquiry has yet been made on the impact of prompt selection on labelling accuracy. In this study, we show that performance greatly varies between prompts, and we apply the method of automatic prompt optimization to systematically craft high quality prompts. We also provide the community with a simple, browser-based implementation of the method at https://prompt-ultra.github.io/ .
翻译:大型语言模型近期已被应用于社会科学领域的文本标注任务,其性能以极低成本达到甚至超越人工标注水平。然而,目前尚未有研究探讨提示选择对标注准确性的影响。本研究证明,不同提示间的性能表现存在显著差异,并采用自动提示优化方法系统构建高质量提示。我们同时通过 https://prompt-ultra.github.io/ 为学界提供了基于浏览器的简易方法实现。