Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the "truth'' known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA dataset on breast carcinoma patients.
翻译:统计数据模拟在统计模型与方法的开发及其性能评估中至关重要。为捕捉复杂数据结构(尤其是高维数据),多种模拟方法被提出,包括参数模拟和所谓的Plasmode模拟。尽管参数模拟数据的真实性存在担忧,但广泛认为Plasmode模拟能高度接近真实,且其“真相”的某些方面已知。然而,目前尚无明确的指南或成熟规范来执行Plasmode数据模拟。本文首先回顾现有文献,介绍统计Plasmode模拟的概念;接着讨论统计Plasmode的优势与挑战,并给出其生成的分步流程,涵盖实施与报告的关键步骤;最后,通过一个公开的乳腺癌患者真实RNA数据集,阐明统计Plasmode的概念及所提出的Plasmode生成流程。