Large Language Models (LLMs) are increasingly used in mental health-related settings, yet they struggle to sustain realistic, goal-directed dialogue over extended interactions. While LLMs generate fluent responses, they optimize locally for the next turn rather than maintaining a coherent model of therapeutic progress, leading to brittleness and long-horizon drift. We introduce CALM-IT, a framework for generating and evaluating long-form Motivational Interviewing (MI) dialogues that explicitly models dual-actor conversational dynamics. CALM-IT represents therapist-client interaction as a bidirectional state-space process, in which both agents continuously update inferred alignment, mental states, and short-term goals to guide strategy selection and utterance generation. Across large-scale evaluations, CALM-IT consistently outperforms strong baselines in Effectiveness and Goal Alignment and remains substantially more stable as conversation length increases. Although CALM-IT initiates fewer therapist redirections, it achieves the highest client acceptance rate (64.3%), indicating more precise and therapeutically aligned intervention timing. Overall, CALM-IT provides evidence for modeling evolving conversational state being essential for generating high-quality long-form synthetic conversations.
翻译:大型语言模型(LLM)在心理健康相关场景中的应用日益增多,但其在长程互动中难以维持真实、目标导向的对话。尽管LLM能够生成流畅的回复,但其优化仅局限于局部下一轮对话,而非维持连贯的治疗进展模型,从而导致对话脆弱性和长程漂移问题。本文提出CALM-IT框架,该框架通过显式建模双参与者对话动态来生成和评估长程动机性访谈对话。CALM-IT将治疗师-来访者互动表征为双向状态空间过程,其中双方参与者持续更新推断的一致性、心理状态和短期目标,以指导策略选择和话语生成。在大规模评估中,CALM-IT在有效性和目标一致性方面持续优于强基线模型,且随着对话长度增加仍保持显著更高的稳定性。尽管CALM-IT启动的治疗师重定向次数较少,但其实现了最高的来访者接受率(64.3%),表明干预时机的选择更为精确且更符合治疗目标。总体而言,CALM-IT证明了建模动态演进的对话状态对于生成高质量长程合成对话至关重要。