Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs as labeled data are usually unavailable in the target population. This article proposes a Latent Class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assign cause of death for out-of-domain observations, and estimate cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop an efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary materials for this article and the R package to implement the model are available online.
翻译:理解特定死因死亡率对于监测人口健康和设计公共卫生干预措施至关重要。全球范围内,三分之二的死亡案例未确定死因。口头尸检(VA)是一种成熟的工具,通过对死者照护者进行问卷调查,收集医院外死亡案例的信息。该工具已在许多中低收入国家常规实施。使用VA数据进行死因分配的统计算法通常容易受到训练模型所用数据与目标人群之间分布偏移的影响。这给VA数据分析带来了重大挑战,因为目标人群通常缺乏标注数据。本文提出一种针对VA数据的潜类模型框架(LCVA),该框架联合建模多个异质性域中收集的VA数据,为域外观测分配死因,并估计新域中的特定死因死亡率分数。我们利用嵌套潜类模型引入症状联合分布的简约表示,并开发了一种高效的后验推理算法。研究表明,LCVA在预测性能和可扩展性方面优于现有方法。本文的补充材料及实现该模型R包可在线获取。