Surrogate method for partial association between mixed data with application to well-being survey analysis

This paper is motivated by the analysis of a survey study of college student wellbeing before and after the outbreak of the COVID-19 pandemic. A statistical challenge in well-being survey studies lies in that outcome variables are often recorded in different scales, be it continuous, binary, or ordinal. The presence of mixed data complicates the assessment of the associations between them while adjusting for covariates. In our study, of particular interest are the associations between college students' wellbeing and other mental health measures and how other risk factors moderate these associations during the pandemic. To this end, we propose a unifying framework for studying partial association between mixed data. This is achieved by defining a unified residual using the surrogate method. The idea is to map the residual randomness to the same continuous scale, regardless of the original scales of outcome variables. It applies to virtually all commonly used models for covariate adjustments. We demonstrate the validity of using such defined residuals to assess partial association. In particular, we develop a measure that generalizes classical Kendall's tau in the sense that it can size both partial and marginal associations. More importantly, our development advances the theory of the surrogate method developed in recent years by showing that it can be used without requiring outcome variables having a latent variable structure. The use of our method in the well-being survey analysis reveals (i) significant moderation effects (i.e., the difference between partial and marginal associations) of some key risk factors; and (ii) an elevated moderation effect of physical health, loneliness, and accommodation after the onset of COVID-19.

翻译：本文的研究动机源于一项关于新冠疫情爆发前后大学生幸福感的调查分析。幸福感调查研究中的统计挑战在于结果变量常以不同尺度记录（如连续型、二分类或有序型）。混合数据的存在使得在协变量校正条件下评估变量间的关联变得复杂。本研究重点关注大学生幸福感与其他心理健康指标之间的关联，以及疫情背景下风险因素如何调节这些关联。为此，我们提出一个研究混合数据间偏关联的统一框架。该框架通过代用方法定义统一残差来实现：无论结果变量的原始尺度如何，都将残差随机性映射至同一连续尺度上。该方法适用于几乎所有常用协变量调整模型，我们验证了使用此类残差评估偏关联的有效性。特别地，我们开发了一种能够同时度量偏关联与边际关联的指标，该指标在本质上推广了经典Kendall秩相关系数。更重要的是，本研究推进了近年代用方法的理论发展，证明其应用无需假设结果变量存在潜变量结构。将所提方法应用于幸福感调查分析后揭示：（i）若干关键风险因素存在显著调节效应（即偏关联与边际关联的差异）；（ii）新冠疫情爆发后，身体健康、孤独感及居住条件的调节效应显著增强。