This study presents an Initial Data Analysis (IDA) of the German Transplantation Registry (TxReg) data for a better data understanding and to inform future data analyses. The IDA is focusing on data on first-time kidney-only transplantations in adult recipients from deceased donors between 2006 and 2016 and refers to data from 14,954 recipients and 9,964 donors across 25 tables. Investigated aspects include missing data patterns and structure, data consistency, and availability of event time data. Results show that missing data proportions vary widely, with some tables nearly complete while others have over 50% missing values. Missing data patterns are identified using a decision tree approach. An influx and outflux analysis demonstrates that some variables have high potential for imputing missing data, while others were less suitable for imputation. We identified 168 multi-sourced variables that are reported by multiple data providers in parallel leading to discrepancies for some variables but also providing opportunities for missing data imputation. Our findings on event time data demonstrate the importance of carefully selecting the variables used for event time analyses as results will strongly depend on this selection. In summary, our findings highlight the challenges when utilizing the TxReg data for research and provide recommendations for data preprocessing and analysis in future analyses.
翻译:本研究对德国移植登记库(TxReg)数据进行初步数据分析,旨在增进数据理解并为未来数据分析提供参考。该分析聚焦于2006年至2016年间成年受者接受首次单独肾移植(供体为死亡捐献者)的相关数据,涉及25个数据表中的14,954例受者与9,964例供者信息。研究内容涵盖缺失数据模式与结构、数据一致性以及事件时间数据的可用性。结果显示缺失数据比例差异显著:部分数据表接近完整,而其他表格缺失值超过50%。我们采用决策树方法识别缺失数据模式。流入流出分析表明,部分变量对缺失数据插补具有较高潜力,而其他变量则较不适用。我们识别出168个多源变量,这些变量由多个数据提供方并行报告,导致部分变量存在不一致性,但同时也为缺失数据插补提供了可能。关于事件时间数据的发现表明,谨慎选择用于事件时间分析的变量至关重要,因为分析结果将高度依赖于变量选择。综上所述,本研究结果揭示了利用TxReg数据进行研究所面临的挑战,并为未来分析中的数据预处理与分析策略提供了建议。