Conversations among online users sometimes derail, i.e., break down into personal attacks. Such derailment has a negative impact on the healthy growth of cyberspace communities. The ability to predict whether ongoing conversations are likely to derail could provide valuable real-time insight to interlocutors and moderators. Prior approaches predict conversation derailment retrospectively without the ability to forestall the derailment proactively. Some works attempt to make dynamic prediction as the conversation develops, but fail to incorporate multisource information, such as conversation structure and distance to derailment. We propose a hierarchical transformer-based framework that combines utterance-level and conversation-level information to capture fine-grained contextual semantics. We propose a domain-adaptive pretraining objective to integrate conversational structure information and a multitask learning scheme to leverage the distance from each utterance to derailment. An evaluation of our framework on two conversation derailment datasets yields improvement over F1 score for the prediction of derailment. These results demonstrate the effectiveness of incorporating multisource information.
翻译:在线用户之间的对话有时会偏离正轨,即演变为人身攻击。这种偏离对网络社区的健康成长产生负面影响。预测正在进行的对话是否可能偏离的能力,可为对话参与者和管理者提供有价值的实时洞察。先前的方法只能事后回顾性地预测对话偏离,而无法主动预防。一些研究尝试在对话发展过程中进行动态预测,但未能整合多源信息,如对话结构及与偏离点的距离。我们提出一种基于层次化Transformer的框架,结合话语级和对话级信息以捕捉细粒度上下文语义。我们设计了领域自适应预训练目标来整合对话结构信息,并提出多任务学习方案以利用每句话语到偏离点的距离。在两个对话偏离数据集上的评估表明,该框架在预测偏离的F1分数上取得提升。这些结果验证了整合多源信息的有效性。