Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards. Through this rigorous process, we construct PsyCoTalk, the first large-scale dialogue dataset supporting comorbidity, containing 3,000 multi-turn diagnostic dialogues validated by psychiatrists. This dataset enhances diagnostic accuracy and treatment planning, offering a valuable resource for psychiatric comorbidity research. Compared to real-world clinical transcripts, PsyCoTalk exhibits high structural and linguistic fidelity in terms of dialogue length, token distribution, and diagnostic reasoning strategies. Licensed psychiatrists confirm the realism and diagnostic validity of the dialogues. This dataset enables the development and evaluation of models capable of multi-disorder psychiatric screening in a single conversational pass.
翻译:精神共病具有显著的临床意义,但由于多种障碍共存的复杂性,其诊断面临挑战。为解决这一问题,我们提出了一种整合合成患者电子病历构建与多智能体诊断对话生成的新方法。我们采用一个确保临床相关性与多样性的流程,针对常见共病条件创建了502份合成电子病历。我们的多智能体框架将临床访谈规程转化为分层状态机与上下文树,在维持临床标准的同时支持超过130种诊断状态。通过这一严谨流程,我们构建了首个支持共病研究的大规模对话数据集PsyCoTalk,包含3000轮经精神科医师验证的多轮诊断对话。该数据集提升了诊断准确性与治疗规划能力,为精神共病研究提供了宝贵资源。与真实临床转录文本相比,PsyCoTalk在对话长度、词元分布及诊断推理策略方面均展现出高度的结构与语言保真度。执业精神科医师确认了对话的真实性与诊断有效性。该数据集支持能够在单次对话过程中进行多障碍精神筛查的模型的开发与评估。