Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy.
翻译:大语言模型(LLM)在语音与文本建模中正发挥着至关重要的作用。为探究如何最佳利用上下文及多系统输出来进行语音识别(ASR)后的语音情感预测,我们在名为GenSEC的最新任务上研究了LLM提示技术。我们的方法包括ASR转录文本排序、可变对话上下文以及系统输出融合。研究表明,对话上下文存在收益递减效应,且用于选择预测转录文本的评价指标至关重要。最终,我们提交的最佳方案在绝对准确率上超越了所提供基线20%。