In most of the world, causes of death are not recorded. Verbal autopsies are structured interviews with people close to the deceased, which are used to estimate the likelihood of various causes of death. Such estimates typically make use of a table of marginal probabilities, called a `probbase', describing the frequency of answers to each interview question conditional on each cause of death. Assembling probbase tables is challenging, since data labelled with verified causes-of-death are not typically available, and is generally done on the basis of expert opinion. We propose a method to verify or partially learn a probbase table given only a set of verbal autopsy questionnaires (i.e., unlabelled data). Essentially, we assess how well a probbase can be used to impute answers. Our method requires a mild conditional independence assumption on the joint distribution of questionnaire data and causes of death. More generally, our method serves as a means to assess verbal autopsy algorithms and parameters without the need for external cause-of-death labelling. We offer theoretical arguments to support our method, and some brief evaluations on data simulated to resemble realistic verbal autopsy questionnaires. We find moderate promise for the approach in this context, in that we may differentiate probbase values which are too high or too low with around 75% correctness using 1500 verbal autopsy questionnaires. This paper serves as an introduction to our approach and a statement of intent, in the spirit of preregistration. We identify a range of theoretical and practical open problems and describe a planned outline of work to evaluate the method. We invite comments and suggestions on our approach and open questions. We stress that our method has not yet been thoroughly tested and we do not endorse its use in a real-world setting at this stage.
翻译:在全球大多数地区,死亡原因未被记录。口头尸检是对逝者亲友进行的结构化访谈,用于估计各种死亡原因的可能性。此类估计通常利用一个称为“概率基”的边缘概率表,该表描述了在每种死亡原因条件下对每个访谈问题回答的频率。构建概率基表具有挑战性,因为通常无法获得带有已验证死亡原因标签的数据,且一般基于专家意见完成。我们提出一种方法,仅给定一组口头尸调查问卷(即未标记数据)即可验证或部分学习概率基表。本质上,我们评估概率基用于插补答案的效果。我们的方法需要对问卷数据与死亡原因的联合分布进行温和的条件独立性假设。更广泛地说,我们的方法提供了一种无需外部死亡原因标签即可评估口头尸检算法和参数的手段。我们提供理论论证以支持该方法,并在模拟现实口头尸调查问卷的数据上进行了简要评估。在此背景下,我们发现该方法具有中等潜力:使用1500份口头尸调查问卷,我们可以以约75%的正确率区分过高或过低的概率基值。本文以预注册精神介绍了我们的方法并阐述了研究意图。我们提出了一系列理论与实际开放性问题,并描述了评估该方法的计划工作框架。我们诚邀对我们的方法与开放性问题提出评论与建议。我们强调,该方法尚未经过全面测试,现阶段不推荐在实际场景中使用。