Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling and estimating human engagement in the MULTIMEDIATE 2023 competition. Our proposed system surpasses the baseline models, exhibiting a noteworthy $7$\% improvement on test set and $4$\% on validation set. Moreover, we employ different modality fusion mechanism and show that for this type of data, a simple concatenated method with self-attention fusion gains the best performance.
翻译:对话参与度估计被设定为回归问题,旨在识别对话参与者良好的注意力与投入程度。该任务对于深入理解人际互动动态及行为模式具有关键研究价值。本研究提出基于膨胀卷积Transformer的模型,用于MULTIMEDIATE 2023竞赛中人类参与度的建模与估计。我们提出的系统超越基准模型,在测试集上实现显著7%的提升,在验证集上提升4%。此外,我们探究了不同模态融合机制,结果表明对于此类数据,采用带自注意力融合的简单拼接方法可获得最优性能。