Bayesian Fractional Polynomial Approach to Quantile Regression and Variable Selection with Application in the Analysis of Blood Pressure among US Adults

2023 年 7 月 22 日

翻译：贝叶斯分数多项式分位数回归与变量选择方法及其在美国成年人血压分析中的应用

Sanna Soomro,Keming Yu

Hypertension is a highly prevalent chronic medical condition and a strong risk factor for cardiovascular disease (CVD), as it accounts for more than $45\%$ of CVD. The relation between blood pressure (BP) and its risk factors cannot be explored clearly by standard linear models. Although the fractional polynomials (FPs) can act as a concise and accurate formula for examining smooth relationships between response and predictors, modelling conditional mean functions observes the partial view of a distribution of response variable, as the distributions of many response variables such as BP measures are typically skew. Then modelling 'average' BP may link to CVD but extremely high BP could explore CVD insight deeply and precisely. So, existing mean-based FP approaches for modelling the relationship between factors and BP cannot answer key questions in need. Conditional quantile functions with FPs provide a comprehensive relationship between the response variable and its predictors, such as median and extremely high BP measures that may be often required in practical data analysis generally. To the best of our knowledge, this is new in the literature. Therefore, in this paper, we employ Bayesian variable selection with quantile-dependent prior for the FP model to propose a Bayesian variable selection with parametric nonlinear quantile regression model. The objective is to examine a nonlinear relationship between BP measures and their risk factors across median and upper quantile levels using data extracted from the 2007-2008 National Health and Nutrition Examination Survey (NHANES). The variable selection in the model analysis identified that the nonlinear terms of continuous variables (body mass index, age), and categorical variables (ethnicity, gender and marital status) were selected as important predictors in the model across all quantile levels.

翻译：高血压是一种高度流行的慢性疾病，也是心血管疾病（CVD）的强风险因素，占CVD病因的45%以上。标准线性模型无法清晰揭示血压（BP）与其风险因素之间的关系。尽管分数多项式（FPs）可作为简洁精确的公式来检验响应变量与预测变量之间的平滑关系，但建模条件均值函数只能观察到响应变量分布的局部视图，因为许多响应变量（如血压测量值）的分布通常呈偏态。因此，建模"平均"血压可能与CVD相关，但极高血压值能更深入精确地探索CVD的机制。现有基于均值的FP方法在建模因素与血压关系时，无法回答关键问题。结合FPs的条件分位数函数能全面揭示响应变量与预测变量之间的关系，例如中位数和极高血压测量值，这些在实际数据分析中通常至关重要。据我们所知，这是文献中的新方法。因此，本文采用基于分位数先验的贝叶斯变量选择方法，提出了一种参数非线性分位数回归模型的贝叶斯变量选择方法。目的是利用2007-2008年美国国家健康与营养调查（NHANES）数据，探索中位数和高分位数水平下血压测量值与其风险因素之间的非线性关系。模型分析中的变量选择结果表明，连续变量（体重指数、年龄）的非线性项以及分类变量（种族、性别和婚姻状况）在所有分位数水平上均被选为重要预测变量。