Sample size for developing a prediction model with a binary outcome: targeting precise individual risk estimates to improve clinical decisions and fairness

估计/估计量 · 样本 · 查准率/准确率 · INFORMS · MoDELS ·

2024 年 7 月 12 日

翻译：开发二元结局预测模型所需样本量：以精确个体风险评估提升临床决策与公平性

Richard D Riley,Gary S Collins,Rebecca Whittle,Lucinda Archer,Kym IE Snell,Paula Dhiman,Laura Kirton,Amardeep Legha,Xiaoxuan Liu,Alastair Denniston,Frank E Harrell Jr,Laure Wynants,Glen P Martin,Joie Ensor

from arxiv, 36 pages, 6 figures, 1 table

When developing a clinical prediction model, the sample size of the development dataset is a key consideration. Small sample sizes lead to greater concerns of overfitting, instability, poor performance and lack of fairness. Previous research has outlined minimum sample size calculations to minimise overfitting and precisely estimate the overall risk. However even when meeting these criteria, the uncertainty (instability) in individual-level risk estimates may be considerable. In this article we propose how to examine and calculate the sample size required for developing a model with acceptably precise individual-level risk estimates to inform decisions and improve fairness. We outline a five-step process to be used before data collection or when an existing dataset is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model, and an assumed 'core model' either specified directly (i.e., a logistic regression equation is provided) or based on specified C-statistic and relative effects of (standardised) predictors. We produce closed-form solutions that decompose the variance of an individual's risk estimate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to quickly calculate and examine individual-level uncertainty interval widths and classification instability for specified sample sizes. Such information can be presented to key stakeholders (e.g., health professionals, patients, funders) using prediction and classification instability plots to help identify the (target) sample size required to improve trust, reliability and fairness in individual predictions. Our proposal is implemented in software module pmstabilityss. We provide real examples and emphasise the importance of clinical context including any risk thresholds for decision making.

翻译：在开发临床预测模型时，开发数据集的样本量是关键考量因素。样本量过小会加剧过拟合、模型不稳定、性能不佳及公平性缺失等问题。先前研究已提出最小样本量计算方法以最小化过拟合并精确估计总体风险。然而，即使满足这些标准，个体层面风险评估的不确定性（不稳定性）仍可能相当显著。本文提出如何检验和计算开发模型所需样本量，以获得可接受的个体层面风险估计精度，从而支撑决策并提升公平性。我们构建了一个五步流程，适用于数据收集前或已有数据集的情况。该流程要求研究者明确目标人群的总体风险、模型中关键预测变量的（预期）分布，以及设定的"核心模型"——可直接指定（如提供逻辑回归方程）或基于指定的C统计量及（标准化）预测变量的相对效应推导。我们推导出封闭解，将个体风险估计的方差分解为费希尔单位信息矩阵、预测变量值与总样本量的函数；这使得研究者能快速计算并检验特定样本量下个体层面的不确定性区间宽度与分类不稳定性。此类信息可通过预测与分类不稳定性图表呈现给关键利益相关方（如医疗专业人员、患者、资助者），以协助确定提升个体预测可信度、可靠性与公平性所需的（目标）样本量。我们的方案已在软件模块pmstabilityss中实现。文中提供实际案例，并强调临床背景的重要性，包括决策所用的风险阈值。