How to measure intra-physician variability in clinical decision-making?

Intra-physician prescribing variability, the probability that one physician issues discordant decisions for two patients deemed comparable on observed covariates, holds great impact in quality of care, safety and cost. However, there are no known validated measurement methods. Here, we benchmark eight methods (Euclidean, Mahalanobis, Learned-Weights, Genetic Mahalanobis, Random Forest proximity, Mutual-Information-weighted, Latent Profile Analysis and Bayesian binomial generalized linear mixed model) against a synthetic ground truth across 94 experimental conditions. Learned-Weights matching achieves the lowest mean absolute error (0.027), followed by Mutual-Information-weighted matching (0.028) and RF Proximity (0.034). All eight discordance-analysis methods preserve the physician rank ordering with high fidelity (Spearman > 0.89 versus the ground truth on the SCORE2 experiment), as long as the physician variability groups are well separated. Under a continuous-heterogeneity physician model, rank preservation degrades substantially for unsupervised methods (Spearman = [0.28, 0.35]) but is retained by supervised feature-weighted methods and the GLMM (Spearman = [0.62, 0.68]). This controlled methodological evaluation is a foundation for validation on observational prescribing data. Once validated on observational prescribing data, these evaluated open-source estimators could turn prescribing inconsistency into a routinely measurable clinician-level quality metric, systematically complementing the existing literature on between-physician variation.

翻译：医生个体内处方变异性——即同一医生为观察协变量可比的两名患者做出不一致决策的概率——对医疗质量、安全性和成本具有重大影响。然而，目前尚无经过验证的标准化测量方法。本研究在94个实验条件下，以合成真值为基准，对八种方法（欧氏距离、马氏距离、学习加权、遗传马氏距离、随机森林邻近度、互信息加权、潜在剖面分析及贝叶斯二项广义线性混合模型）进行系统评估。学习加权匹配方法实现了最低的平均绝对误差（0.027），其次为互信息加权匹配（0.028）和随机森林邻近度（0.034）。在医生变异组间区分度良好的条件下，所有八种不一致分析方法均能高保真度保持医生等级排序（SCORE2实验中斯皮尔曼相关系数>0.89）。在连续异质性医生模型下，无监督方法的等级保持能力显著下降（斯皮尔曼相关系数=[0.28,0.35]），而监督特征加权方法和广义线性混合模型仍能维持较好性能（斯皮尔曼相关系数=[0.62,0.68]）。此项受控方法学评估为基于观测性处方数据的验证奠定了基础。经真实世界处方数据验证后，这些经评估的开源估计器有望将处方不一致性转化为可常规测量的临床医生质量指标，系统性补充现有关于医生间变异性的研究文献。