MASIVE: Open-Ended Affective State Identification in English and Spanish

In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{affective states}, which includes any terms that humans use to describe their experiences of feeling. We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each. We then define the new problem of \textit{affective state identification} for language generation models framed as a masked span prediction task. On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states. Additionally, we show that pretraining on MASIVE improves model performance on existing emotion benchmarks. Finally, through machine translation experiments, we find that native speaker-written data is vital to good performance on this task.

翻译：在情感分析领域，许多自然语言处理研究专注于识别有限数量的离散情感类别，并通常跨语言应用。然而，这些基础集合很少是针对文本数据设计的，并且文化、语言和方言会影响特定情感的解释方式。在这项工作中，我们将研究范围扩展到一个几乎无界的\textit{情感状态}集合，该集合包括人类用来描述其感受体验的任何术语。我们收集并发布了MASIVE数据集，这是一个包含英语和西班牙语Reddit帖子的数据集，每种语言包含超过1000个独特的情感状态。随后，我们将语言生成模型的\textit{情感状态识别}定义为一个新的问题，并将其构建为掩码跨度预测任务。在此任务中，我们发现较小的微调多语言模型优于更大的大型语言模型，即使在特定区域的西班牙语情感状态识别上也是如此。此外，我们证明在MASIVE上进行预训练可以提高模型在现有情感基准测试上的性能。最后，通过机器翻译实验，我们发现由母语者撰写的数据对于此任务的良好性能至关重要。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日