Laughter serves as a multifaceted communicative signal in human interaction, yet its identification within dialogue presents a significant challenge for conversational AI systems. This study addresses this challenge by annotating laughable contexts in Japanese spontaneous text conversation data and developing a taxonomy to classify the underlying reasons for such contexts. Initially, multiple annotators manually labeled laughable contexts using a binary decision (laughable or non-laughable). Subsequently, an LLM was used to generate explanations for the binary annotations of laughable contexts, which were then categorized into a taxonomy comprising ten categories, including "Empathy and Affinity" and "Humor and Surprise," highlighting the diverse range of laughter-inducing scenarios. The study also evaluated GPT-4o's performance in recognizing the majority labels of laughable contexts, achieving an F1 score of 43.14%. These findings contribute to the advancement of conversational AI by establishing a foundation for more nuanced recognition and generation of laughter, ultimately fostering more natural and engaging human-AI interactions.
翻译:笑声在人类互动中作为一种多层面的交际信号,然而其在对话中的识别对会话人工智能系统构成了重大挑战。本研究通过标注日语自发文本对话数据中的可引发笑声的语境,并构建一个分类体系来对此类语境的根本原因进行分类,以应对这一挑战。首先,多名标注者使用二元决策(可引发笑声或不可引发笑声)手动标注了可引发笑声的语境。随后,利用一个大语言模型为这些二元标注的可引发笑声语境生成解释,进而将这些解释归类为一个包含十个类别的分类体系,其中包括"共情与亲和"以及"幽默与惊奇",突显了引发笑声场景的多样性。本研究还评估了GPT-4o在识别可引发笑声语境的主流标注方面的性能,其F1分数达到了43.14%。这些发现通过为更细致地识别和生成笑声奠定基础,从而促进更自然、更具吸引力的人机交互,推动了会话人工智能的发展。