The reuse of medico-administrative and synthetic spatial data may overcome some limitations of population-based registries, provided rigorous validation is performed. However, no tool exists to spatially validate a candidate-for-reuse database (CFRD) against a gold standard (GS). We propose a Bayesian framework for two-dimensional (global and local) map-to-map validation of spatial health-event databases. We consider an error-model family (random [REM] and structured [SEM]) in which the CFRD is modelled as a departure from the GS. Both are compared with a shared component model (SCM). Global disagreement is assessed using the database-specific intercept difference ($RR_{\mathrm{global}}$), while local disagreement is measured by the exceedance probability of the database-specific error term. Disturbance scenarios included null, uniform, clustered, and random perturbations in the CFRD. Sensitivity, specificity, false detection rate, and Matthews Correlation Coefficient assessed detection performance. $RR_{\mathrm{global}}$ accurately recovered map-wide shifts across all models and scenarios. REM and SEM behaved were both sensitive and specific to local discrepancies. SCM was more conservative. Applied to Crohn's disease data from the EPIMAD registry and a CFRD, all models reached the same conclusion: the CFRD reproduced global and local spatial structures with an overall signal about 7\% lower. Extensions to other outcome distributions, spatio-temporal models and calibration constitute natural next steps. \textit{Keywords:} data reuse; spatial database validation; Bayesian hierarchical models; disease mapping; shared component model.
翻译:医疗行政数据与合成空间数据的重复使用可能克服基于人群的登记数据的一些局限性,前提是经过严格验证。然而,目前尚无工具可对候选重复使用数据库(CFRD)与金标准(GS)进行空间验证。我们提出一种贝叶斯框架,用于对空间健康事件数据库进行二维(全局与局部)地图对地图验证。我们考虑一组误差模型族(随机误差模型[REM]和结构化误差模型[SEM]),其中CFRD被建模为对GS的偏离。两者均与共享成分模型(SCM)进行比较。全局不一致性通过数据库特定的截距差异($RR_{\mathrm{global}}$)评估,而局部不一致性则通过数据库特定误差项的超越概率衡量。干扰情景包括CFRD中的零扰动、均匀扰动、聚类扰动和随机扰动。灵敏度、特异度、假发现率和马修斯相关系数用于评估检测性能。$RR_{\mathrm{global}}$在所有模型和情景中均准确恢复了地图范围的偏移。REM和SEM对局部不一致性兼具灵敏性和特异性,而SCM则更为保守。将模型应用于EPIMAD登记系统的克罗恩病数据和CFRD后,所有模型均得出相同结论:CFRD再现了全局和局部空间结构,但整体信号强度降低约7%。扩展到其他结局分布、时空模型和校准是自然的后续步骤。\textit{关键词:}数据重复使用;空间数据库验证;贝叶斯层次模型;疾病制图;共享成分模型。