Forecasting conversational derailment is the task of predicting, as the conversation unfolds, whether it will eventually derail into personal attacks. Since forecasting models operate in an online fashion, they must decide whether to "trigger" an alert after each utterance--for example, to notify participants or a moderator that the conversation is at risk of derailing. Existing approaches make this decision solely based on the estimated likelihood of derailment given the preceding utterances, implicitly assuming that the conversation's future trajectory is fixed. As a result, they ignore the possibility of future recovery and incur an unnecessarily high rate of false positives. In this work we propose a method for decoupling the decision to trigger from derailment likelihood estimation. Our approach is inspired by the first human baseline on this task, which shows that humans achieve dramatically lower false positive rates by selectively deferring their decision to trigger when they anticipate that tension is likely to subside. We operationalize this insight with a deferral mechanism that uses forward-looking simulations to assess whether a tense moment admits plausible paths to recovery. Incorporating this mechanism into a state-of-the-art forecasting model substantially reduces false positives without sacrificing forecasting accuracy. More broadly, this work highlights the value of treating decision-making as a first-class component of forecasting systems.
翻译:预测对话脱轨是指在对话进行过程中,预测其最终是否会升级为人身攻击。由于预测模型是在线运行的,它们必须决定在每轮话语后是否“触发”警报——例如,通知参与者或主持人对话存在脱轨风险。现有方法仅根据当前话语之前估计的脱轨可能性来做出这一决定,隐含假设对话的未来轨迹是固定的。因此,它们忽略了未来恢复的可能性,并导致不必要的高误报率。在这项工作中,我们提出了一种将触发决策与脱轨可能性估计分离的方法。我们的方法受该任务的人类基线表现启发,该基线表明人类通过选择性地推迟触发决策(当他们预计紧张局势可能缓解时)实现了显著更低的误报率。我们通过一种延迟机制将这一见解实现,该机制使用前瞻性模拟来评估紧张时刻是否存在合理的恢复路径。将这一机制融入最先进的预测模型中,可在不牺牲预测准确性的前提下大幅降低误报率。更广泛地说,这项工作强调了将决策制定视为预测系统首要组成部分的价值。