Parametric ROC Analysis and Optimal Cutoff Selection under Scale Mixtures of Skew-Normal Distributions: A Decision-Theoretic Framework with Asymptotic Inference

翻译：基于偏态正态分布尺度混合的参数化ROC分析与最优截止值选择：一个具有渐近推断的决策论框架

Renato de Paula,Helena Mouriño,Tiago Dias Domingues

from arxiv, 42 pages, 3 figures

We study an optimal threshold functional arising in binary classification for continuous biomarkers. While the ROC curve summarizes discriminatory performance across all thresholds, practical threshold selection must also account for disease prevalence and asymmetric misclassification costs. The classical Youden index corresponds to a symmetric special case and may therefore be suboptimal in realistic decision settings. In addition, biomarker distributions in serological and immunological studies often display skewness and heavy tails, making Gaussian ROC models inadequate. We develop a parametric framework for ROC analysis and optimal cutoff selection under the family of scale mixtures of skew-normal (SMSN) distributions, including the skew-normal and skew-t models. The ROC curve and AUC are estimated by plug-in maximum likelihood from separate group fits. The optimal cutoff is defined as the minimiser of a weighted misclassification risk, which yields a likelihood ratio equation extending the Youden criterion. Under a monotone likelihood ratio condition, we establish existence, uniqueness, and global optimality of the cutoff. We further study its local regularity as an implicitly defined functional of the model parameter and derive consistency, asymptotic normality, and a closed-form plug-in variance estimator. A central term in this variance is the local slope of the estimating equation at the optimal threshold, which acts as a local identifiability diagnostic. Monte Carlo experiments across six scenarios show that the asymptotic approximation is accurate and that Wald confidence intervals attain near nominal coverage. An application to SARS-CoV-2 serological data illustrates that the proposed cutoff can differ substantially from the Youden threshold and may reduce estimated misclassification risk by up to 63% under asymmetric decision settings.

翻译：我们研究了二元分类中连续生物标志物的最优阈值泛函。虽然ROC曲线总结了所有阈值下的判别性能，但实际阈值选择还需考虑疾病患病率和不对称误分类成本。经典的约登指数对应对称特例，因此在现实决策场景中可能并非最优。此外，血清学和免疫学研究中的生物标志物分布常呈现偏态和厚尾特征，使得高斯ROC模型不适用。我们针对偏态正态分布尺度混合（SMSN）分布族（包括偏态正态和偏态t模型）建立了参数化ROC分析与最优截止值选择框架。通过分组最大似然插补估计ROC曲线和AUC。最优截止值定义为加权误分类风险的最小化器，由此得到扩展约登准则的似然比方程。在单调似然比条件下，我们证明了截止值的存在性、唯一性和全局最优性。进一步研究了其作为模型参数隐式定义泛函的局部正则性，推导了相合性、渐近正态性和闭合形式的插补方差估计量。该方差的核心项是最优阈值处估计方程的局部斜率，可作为局部可辨识性诊断指标。六个场景的蒙特卡洛实验表明，渐近近似精确且Wald置信区间接近名义覆盖水平。应用于SARS-CoV-2血清学数据表明，所提出的截止值可能与约登阈值存在显著差异，在非对称决策设置下可降低高达63%的估计误分类风险。