In statistics, it is important to have realistic data sets available for a particular context to allow an appropriate and objective method comparison. For many use cases, benchmark data sets for method comparison are already available online. However, in most medical applications and especially for clinical trials in oncology, there is a lack of adequate benchmark data sets, as patient data can be sensitive and therefore cannot be published. A potential solution for this are simulation studies. However, it is sometimes not clear, which simulation models are suitable for generating realistic data. A challenge is that potentially unrealistic assumptions have to be made about the distributions. Our approach is to use reconstructed benchmark data sets %can be used as a basis for the simulations, which has the following advantages: the actual properties are known and more realistic data can be simulated. There are several possibilities to simulate realistic data from benchmark data sets. We investigate simulation models based upon kernel density estimation, fitted distributions, case resampling and conditional bootstrapping. In order to make recommendations on which models are best suited for a specific survival setting, we conducted a comparative simulation study. Since it is not possible to provide recommendations for all possible survival settings in a single paper, we focus on providing realistic simulation models for two-armed phase III lung cancer studies. To this end we reconstructed benchmark data sets from recent studies. We used the runtime and different accuracy measures (effect sizes and p-values) as criteria for comparison.
翻译:在统计学中,为特定情境提供真实的数据集对于实现恰当且客观的方法比较至关重要。对于许多应用场景,用于方法比较的基准数据集已在线公开。然而,在大多数医学应用,尤其是肿瘤学临床试验中,由于患者数据可能涉及敏感信息而无法公开,导致缺乏合适的基准数据集。仿真研究是解决此问题的一种潜在方案。但有时难以确定哪些仿真模型适用于生成真实数据。其挑战在于可能需要对分布做出不切实际的假设。我们的方法是使用重构的基准数据集作为仿真基础,这具有以下优势:实际属性已知,且能模拟出更真实的数据。基于基准数据集模拟真实数据存在多种可能途径。我们研究了基于核密度估计、拟合分布、案例重抽样和条件自助法的仿真模型。为了就特定生存分析场景下最适合的模型提出建议,我们开展了一项比较性仿真研究。由于无法在一篇论文中为所有可能的生存分析场景提供通用建议,我们重点针对双臂III期肺癌研究构建真实仿真模型。为此,我们从近期研究中重构了基准数据集,并以运行时间和不同精度指标(效应量与p值)作为比较标准。