Distribution-as-response regression problems are gaining wider attention, especially within biomedical settings where observation-rich patient specific data sets are available, such as feature densities in CT scans (Petersen et al., 2021) actigraphy (Ghosal et al., 2023), and continuous glucose monitoring (Coulter et al., 2024; Matabuena et al., 2021). To accommodate the complex structure of such problems, Petersen and M\"uller (2019) proposed a regression framework called Fr\'echet regression which allows non-Euclidean responses, including distributional responses. This regression framework was further extended for variable selection by Tucker et al. (2023), and Coulter et al. (2024) (arXiv:2403.00922 [stat.AP]) developed a fast variable selection algorithm for the specific setting of univariate distributional responses equipped with the 2-Wasserstein metric (2-Wasserstein space). We present "fastfrechet", an R package providing fast implementation of these Fr\'echet regression and variable selection methods in 2-Wasserstein space, with resampling tools for automatic variable selection. "fastfrechet" makes distribution-based Fr\'echet regression with resampling-supplemented variable selection readily available and highly scalable to large data sets, such as the UK Biobank (Doherty et al., 2017).
翻译:分布作为响应变量的回归问题正受到越来越广泛的关注,尤其是在生物医学领域,那里可获得观测丰富的患者特异性数据集,例如CT扫描中的特征密度(Petersen等人,2021年)、体动记录仪数据(Ghosal等人,2023年)以及连续血糖监测数据(Coulter等人,2024年;Matabuena等人,2021年)。为适应此类问题的复杂结构,Petersen与Müller(2019年)提出了一种称为Fréchet回归的回归框架,该框架允许非欧几里得响应,包括分布响应。Tucker等人(2023年)进一步扩展了该回归框架以进行变量选择,而Coulter等人(2024年)(arXiv:2403.00922 [stat.AP])针对配备2-Wasserstein度量的单变量分布响应这一特定场景,开发了一种快速的变量选择算法。我们推出"fastfrechet"——一个R软件包,它在2-Wasserstein空间中快速实现了这些Fréchet回归和变量选择方法,并提供了用于自动变量选择的重采样工具。"fastfrechet"使得基于分布的Fréchet回归以及辅以重采样的变量选择易于实现,并能高度扩展至大型数据集,例如英国生物银行数据(Doherty等人,2017年)。