In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate associations between variables exclusive to each of the files. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method improves the linking process, and results in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare Enrollment records.
翻译:摘要:在众多医疗健康和社会科学应用中,关于研究对象的信息分散存储于多个数据文件中。为评估各文件独有变量间的关联性,必须跨文件进行记录链接。常见记录链接算法仅依赖所有文件中均出现的链接变量间的相似性。此外,对链接后文件的分析往往忽视因错误匹配或遗漏匹配而产生的误差。贝叶斯记录链接方法通过联合抽样链接结构与模型参数,允许自然传递链接误差。本研究扩展了现有贝叶斯记录链接方法,将各待链接文件独有变量间的关联性整合其中。通过理论推导与模拟实验表明,所提方法能优化链接过程并得出精准推断。我们将该方法应用于"送餐上门"计划受助者与联邦医疗保险登记记录的链接案例中。