Advances in precision medicine increasingly drive methodological innovation in health research. A key development is the use of personalized prediction models (PPMs), which are fit using a similar subpopulation tailored to a specific index patient, and have been shown to outperform one-size-fits-all models, particularly in terms of model discrimination performance. We propose a generalized loss function that enables tuning of the subpopulation size used to fit a PPM. This loss function allows joint optimization of discrimination and calibration, allowing both the performance measures and their relative weights to be specified by the user. To reduce computational burden, we conducted extensive simulation studies to identify practical bounds for the grid of subpopulation sizes. Based on these results, we recommend using a lower bound of 20\% and an upper bound of 70\% of the entire training dataset. We apply the proposed method to both simulated and real-world datasets and demonstrate that previously observed relationships between subpopulation size and model performance are robust. Furthermore, we show that the choice of performance measures in the loss function influences the optimal subpopulation size selected. These findings support the flexible and computationally efficient implementation of PPMs in precision health research.
翻译:精准医学的进步日益推动健康研究方法论的创新。一个关键发展是个性化预测模型的使用,该模型通过针对特定索引患者定制的相似亚群进行拟合,并已被证明优于通用模型,特别是在模型区分性能方面。我们提出了一种广义损失函数,能够调整用于拟合PPM的亚群规模。该损失函数允许联合优化区分度与校准度,使用户能够同时指定性能指标及其相对权重。为降低计算负担,我们进行了广泛的模拟研究以确定亚群规模网格的实用边界。基于这些结果,我们建议使用整个训练数据集20%的下限和70%的上限。我们将所提方法应用于模拟和真实数据集,证明先前观察到的亚群规模与模型性能之间的关系具有稳健性。此外,我们发现损失函数中性能指标的选择会影响最优亚群规模的选择。这些发现支持在精准健康研究中灵活且计算高效地实现PPM。