In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.
翻译:在大多数死亡发生在医疗体系之外的场景中,言语尸检(VA)是监测死因(COD)趋势的常用工具。VA通过对幸存照护者或亲属进行访谈,预测死者的死因。将VA转化为对研究者和政策制定者具有可操作性的洞察需要两个步骤:(i)利用VA访谈预测可能的死因;(ii)基于预测的死因进行推断(例如,利用死亡样本按人口学因素分析死因构成)。本文提出一种方法,利用最先进的自然语言处理(NLP)技术,从自由文本预测结果(本文中指死因)进行有效推断。该方法被命名为multiPPI++,将近期"预测驱动推断"研究扩展至多项分类场景。我们利用一系列NLP技术进行死因预测,并通过言语尸检数据的实证分析,展示了该方法在处理可迁移性问题上的有效性。无论使用哪种NLP模型生成预测——无论是预测精度更高的GPT-4-32k还是精度较低的KNN——multiPPI++都能恢复真实估计值。研究结果表明,推理校正对公共卫生决策具有实际重要性,并提示若推断任务为最终目标,则无论采用何种NLP算法,拥有少量情境相关的高质量标注数据均至关重要。