Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) algorithm to identify causal relationships between user and system utterance strategies, treating user strategies as states and system strategies as actions. GRaSP identifies user strategies as causal factors influencing system responses, which inform Bidirectional Conditional Generative Adversarial Networks (BiCoGAN) in generating counterfactual utterances for the system. Subsequently, we use the Dueling Double Deep Q-Network (D3QN) model to utilize counterfactual data to determine the best policy for selecting system utterances. Our experiments with the PersuasionForGood dataset show measurable improvements in persuasion outcomes using our approach over baseline methods. The observed increase in cumulative rewards and Q-values highlights the effectiveness of causal discovery in enhancing counterfactual reasoning and optimizing reinforcement learning policies for online dialogue systems.
翻译:针对用户定制劝说对话可提升劝说效果。然而,现有对话系统往往难以适应动态演变的用户状态。本文提出一种新颖方法,利用因果发现与反事实推理来优化系统的劝说能力与结果。我们采用稀疏排列贪心松弛算法(GRaSP)识别用户与系统话语策略间的因果关系,将用户策略视为状态,系统策略视为动作。GRaSP将用户策略识别为影响系统响应的因果因素,这些因素为双向条件生成对抗网络(BiCoGAN)生成系统反事实话语提供依据。随后,我们使用决斗双深度Q网络(D3QN)模型,利用反事实数据确定选择系统话语的最优策略。在PersuasionForGood数据集上的实验表明,相较于基线方法,我们的方法在劝说结果上取得了可量化的提升。累积奖励与Q值的显著增长,凸显了因果发现在增强反事实推理及优化在线对话系统强化学习策略方面的有效性。