How to Simulate Realistic Survival Data? A Simulation Study to Compare Realistic Simulation Models

In statistics, it is important to have realistic data sets available for a particular context to allow an appropriate and objective method comparison. For many use cases, benchmark data sets for method comparison are already available online. However, in most medical applications and especially for clinical trials in oncology, there is a lack of adequate benchmark data sets, as patient data can be sensitive and therefore cannot be published. A potential solution for this are simulation studies. However, it is sometimes not clear, which simulation models are suitable for generating realistic data. A challenge is that potentially unrealistic assumptions have to be made about the distributions. Our approach is to use reconstructed benchmark data sets %can be used as a basis for the simulations, which has the following advantages: the actual properties are known and more realistic data can be simulated. There are several possibilities to simulate realistic data from benchmark data sets. We investigate simulation models based upon kernel density estimation, fitted distributions, case resampling and conditional bootstrapping. In order to make recommendations on which models are best suited for a specific survival setting, we conducted a comparative simulation study. Since it is not possible to provide recommendations for all possible survival settings in a single paper, we focus on providing realistic simulation models for two-armed phase III lung cancer studies. To this end we reconstructed benchmark data sets from recent studies. We used the runtime and different accuracy measures (effect sizes and p-values) as criteria for comparison.

翻译：在统计学中，为特定情境提供真实的数据集对于实现恰当且客观的方法比较至关重要。对于许多应用场景，用于方法比较的基准数据集已在线公开。然而，在大多数医学应用，尤其是肿瘤学临床试验中，由于患者数据可能涉及敏感信息而无法公开，导致缺乏合适的基准数据集。仿真研究是解决此问题的一种潜在方案。但有时难以确定哪些仿真模型适用于生成真实数据。其挑战在于可能需要对分布做出不切实际的假设。我们的方法是使用重构的基准数据集作为仿真基础，这具有以下优势：实际属性已知，且能模拟出更真实的数据。基于基准数据集模拟真实数据存在多种可能途径。我们研究了基于核密度估计、拟合分布、案例重抽样和条件自助法的仿真模型。为了就特定生存分析场景下最适合的模型提出建议，我们开展了一项比较性仿真研究。由于无法在一篇论文中为所有可能的生存分析场景提供通用建议，我们重点针对双臂III期肺癌研究构建真实仿真模型。为此，我们从近期研究中重构了基准数据集，并以运行时间和不同精度指标（效应量与p值）作为比较标准。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日