Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results.
翻译:观点表达识别(OEI)在自然语言处理中至关重要,其应用范围涵盖语音助手至抑郁症诊断等多个领域。本研究将OEI扩展至多模态输入,强调听觉线索在传递超越文本能力的情感细微差别方面的重要性。我们引入了一种新颖的多模态OEI(MOEI)任务,整合文本与语音以模拟真实场景。利用CMU MOSEI和IEMOCAP数据集,我们构建了CI-MOEI数据集。此外,通过将文本转语音(TTS)技术应用于MPQA数据集,获得了CIM-OEI数据集。我们为OEI任务设计了提示模板,以充分发挥大语言模型(LLMs)的生成能力。进一步地,我们提出了一种LLM驱动的方法STOEI,该方法融合语音与文本模态以识别观点表达。实验表明,MOEI能显著提升性能,而我们的方法以9.20%的优势超越现有方法,并取得了最先进(SOTA)的结果。