Enhancing reproducibility and data accessibility is essential to scientific research. However, ensuring data privacy while achieving these goals is challenging, especially in the medical field, where sensitive data are often commonplace. One possible solution is to use synthetic data that mimic real-world datasets. This approach may help to streamline therapy evaluation and enable quicker access to innovative treatments. We propose using a method based on sequential conditional regressions, such as in a fully conditional specification (FCS) approach, along with flexible parametric survival models to accurately replicate covariate patterns and survival times. To make our approach available to a wide audience of users, we have developed user-friendly functions in R and Python to implement it. We also provide an example application to registry data on patients affected by Creutzfeld-Jacob disease. The results show the potentialities of the proposed method in mirroring observed multivariate distributions and survival outcomes.
翻译:提升研究的可重复性与数据可及性对科学研究至关重要。然而,在实现这些目标的同时确保数据隐私具有挑战性,这在敏感数据普遍存在的医学领域尤为突出。一种可行的解决方案是使用模拟真实世界数据集的合成数据。该方法有助于简化治疗评估流程,并促进对创新疗法的快速获取。我们提出一种基于序列条件回归的方法,例如采用完全条件设定(FCS)框架,结合柔性参数化生存模型,以精确复现协变量模式与生存时间。为使该方法能被广大用户群体使用,我们开发了R语言和Python中的用户友好函数来实现该流程。我们还提供了应用于克雅氏病(Creutzfeld-Jacob disease)患者登记数据的示例。结果表明,所提方法在模拟观测到的多元分布与生存结局方面具有显著潜力。