The analysis of multivariate discrete data is crucial in various scientific research areas, such as epidemiology, the social sciences, genomics, and environmental studies. As the availability of such data increases, developing robust analytical and data generation tools is necessary to understand the relationships among variables. This paper builds upon previous work on data generation frameworks for multivariate ordinal data with a prespecified correlation matrix. The proposed algorithm generates multivariate discrete data from marginal distributions that follow the generalized Poisson, negative binomial, and binomial distributions. A step-by-step algorithm is provided, and its performance is illustrated in four simulated data scenarios and three real-data scenarios. This technique has the potential to be applied in a wide range of settings involving the generation of correlated discrete data.
翻译:多变量离散数据分析在流行病学、社会科学、基因组学及环境研究等诸多科学领域至关重要。随着此类数据的可获得性日益增加,开发稳健的分析与数据生成工具对于理解变量间关系显得尤为必要。本文基于先前关于具有预设相关矩阵的多变量有序数据生成框架的研究工作,提出了一种从服从广义泊松分布、负二项分布及二项分布的边缘分布生成多变量离散数据的算法。文中提供了详细的逐步算法,并通过四个模拟数据场景和三个真实数据场景展示了其性能。该技术有望广泛应用于涉及生成相关离散数据的各类场景中。