Medical decision-making increasingly requires rapid and reliable assignment of patients to disease subtypes, as many diseases are no longer treated as single entities. For example, cancer patients may be stratified into aggressive and non-aggressive subtypes, with different treatment strategies for each group. We propose a Bayesian nonparametric approach based on a Dirichlet process mixture model for clustering individuals into disease subtypes. We implement a coordinate ascent variational inference algorithm, yielding an effective and computationally efficient alternative to Markov chain Monte Carlo (MCMC), to support medical decision-making. In synthetic experiments, we demonstrate that the proposed approach accurately assigns observations to their ground-truth clusters, achieving strong performance across evaluation metrics, such as homogeneity and completeness. Additionally, we illustrate the proposed approach achieves a substantial improvement in computational cost compared to MCMC, without sacrificing accuracy that would lead to the increased risk of misdiagnosis.
翻译:医疗决策日益需要快速且可靠地将患者划分为疾病亚型,因为许多疾病已不再被视为单一实体。例如,癌症患者可被分层为侵袭性和非侵袭性亚型,不同亚组采用不同治疗策略。我们提出一种基于Dirichlet过程混合模型的贝叶斯非参数方法,用于将个体聚类为疾病亚型。我们实现了一种坐标上升变分推断算法,为马尔可夫链蒙特卡洛法(MCMC)提供了一种有效且计算高效的替代方案,以支持医疗决策。在合成实验中,我们证明所提方法能准确地将观测值分配至其真实聚类,在完整性和同质性等评估指标上均表现优异。此外,我们展示了所提方法在计算成本上相较MCMC有显著提升,且未牺牲精度——这一精度损失可能导致误诊风险的增加。