Advances in precision medicine increasingly drive methodological innovation in health research. A key development is the use of personalized prediction models (PPMs), which are fit using a similar subpopulation tailored to a specific index patient, and have been shown to outperform one-size-fits-all models, particularly in terms of model discrimination performance. We propose a generalized loss function that enables tuning of the subpopulation size used to fit a PPM. This loss function allows joint optimization of discrimination and calibration, allowing both the performance measures and their relative weights to be specified by the user. To reduce computational burden, we conducted extensive simulation studies to identify practical bounds for the grid of subpopulation sizes. Based on these results, we recommend using a lower bound of 20\% and an upper bound of 70\% of the entire training dataset. We apply the proposed method to both simulated and real-world datasets and demonstrate that previously observed relationships between subpopulation size and model performance are robust. Furthermore, we show that the choice of performance measures in the loss function influences the optimal subpopulation size selected. These findings support the flexible and computationally efficient implementation of PPMs in precision health research.
翻译:精准医学的进步日益推动健康研究方法论的创新。一项关键进展是个性化预测模型(PPMs)的应用,这类模型通过为特定索引患者量身定制的相似亚群进行拟合,已被证明优于“一刀切”的通用模型,尤其在模型区分性能方面。我们提出了一种广义损失函数,能够调整用于拟合PPM的亚群规模。该损失函数允许联合优化区分度与校准度,使用户能够同时指定性能度量及其相对权重。为降低计算负担,我们进行了广泛的模拟研究,以确定亚群规模网格的实用边界。基于这些结果,我们建议使用整个训练数据集的20%作为下界,70%作为上界。我们将所提方法应用于模拟数据集和真实世界数据集,并证明先前观察到的亚群规模与模型性能之间的关系是稳健的。此外,我们还表明损失函数中性能度量的选择会影响所选取的最优亚群规模。这些发现支持了PPMs在精准健康研究中的灵活且计算高效的实现。