Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method named graph-induced fine-tuning (GIFT) which can adapt various Transformer-based pre-trained language models (PLMs) for universal MPC understanding. In detail, the full and equivalent connections among utterances in regular Transformer ignore the sparse but distinctive dependency of an utterance on another in MPCs. To distinguish different relationships between utterances, four types of edges are designed to integrate graph-induced signals into attention mechanisms to refine PLMs originally designed for processing sequential texts. We evaluate GIFT by implementing it into three PLMs, and test the performance on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that GIFT can significantly improve the performance of three PLMs on three downstream tasks and two benchmarks with only 4 additional parameters per encoding layer, achieving new state-of-the-art performance on MPC understanding.
翻译:多方对话中“谁对谁说了什么”这一问题的解决近年来引起了广泛研究关注。然而,现有多方对话理解方法通常将对话者与话语嵌入顺序信息流中,或仅利用多方对话中固有图结构的表层信息。为此,我们提出一种名为图诱导微调(GIFT)的即插即用型轻量方法,该方法可适配各类基于Transformer的预训练语言模型(PLMs)实现通用多方对话理解。具体而言,常规Transformer中话语间全面且等价的信息连接忽略了多方对话中话语间稀疏但关键的依存关系。为区分话语间不同关系,我们设计了四类边结构,通过将图诱导信号融入注意力机制,优化原本面向序列文本处理的PLMs。我们将GIFT植入三个PLMs进行验证,并在包含收话人识别、说话人识别及应答选择的三类下游任务上测试性能。实验结果表明,GIFT仅需在每个编码层增加4个参数,即可显著提升三个PLMs在三大下游任务及两个基准数据集上的表现,实现了多方对话理解的最新最优性能。