Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter $T$ and the top-$p$ parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications.
翻译:近期研究表明,ChatGPT等基础模型已在情感计算等多个领域展现出新兴能力。然而,这些新兴能力的获取需借助提示工程。尽管已存在若干提示技术,该领域仍处于快速发展阶段,诸多提示构想尚待探究。本研究提出一种评估方法,用于探究不同提示或生成参数对基础模型性能的敏感性。我们以情感计算为范畴,在ChatGPT上针对情感分析、毒性检测与讽刺检测三大核心问题展开评估。首先,对自回归文本生成的关键参数进行敏感性分析,具体包括温度参数$T$和核心采样中的top-$p$参数,二者决定了生成过程中模型的保守程度或创造性。其次,探索多种提示构想的有效性,研究不同激励方式或结构对性能的影响。评估体系涵盖情感计算任务的性能指标,以及模型遵循指令的有效性——即生成便于下游应用顺畅解析的响应能力。