Time-to-event estimation (i.e., survival analysis) is common in health research, most often using methods that assume proportional hazards and no competing risks. Because both assumptions are frequently invalid, estimators more aligned with real-world settings have been proposed. An effect can be estimated as the difference in areas below the cumulative incidence functions of two groups up to a pre-specified time point. This approach, restricted mean time lost (RMTL), can be used in settings with competing risks as well. We extend RMTL estimation for use in an understudied health policy application in Medicare. Medicare currently supports healthcare payment for over 69 million beneficiaries, most of whom are enrolled in Medicare Advantage plans and receive insurance from private insurers. These insurers are prospectively paid by the federal government for each of their beneficiaries' anticipated health needs using an ordinary least squares linear regression algorithm. As all coefficients are positive and predictor variables are largely insurer-submitted health conditions, insurers are incentivized to upcode, or report more diagnoses than may be accurate. Such gaming is projected to cost the federal government $40 billion in 2025 alone without clear benefit to beneficiaries. We propose several novel estimators of coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate realistic labeled upcoding data, which were not previously available.
翻译:时间-事件估计(即生存分析)在健康研究中十分常见,最常采用的方法假设比例风险且无竞争风险。由于这两种假设经常不成立,研究者提出了更符合现实场景的估计方法。效应量可通过计算两组累积发病率函数在预设时间点前曲线下面积的差异进行估计。这种方法——限制平均损失时间(RMTL)——同样适用于存在竞争风险的场景。我们将RMTL估计方法拓展应用于一项尚未充分研究的医疗保险政策场景。医疗保险目前为超过6900万受益人提供医疗支付支持,其中大多数参保者加入了医疗保险优势计划,并通过私营保险公司获得保险。联邦政府采用普通最小二乘线性回归算法,根据每位受益人的预期健康需求向这些保险公司预付费用。由于所有回归系数均为正数,且预测变量主要是保险公司提交的健康状况,保险公司有动机进行"高编码",即报告比实际情况更多的诊断。据预测,仅2025年此类博弈行为就将使联邦政府损失400亿美元,而受益人并未获得明确收益。我们提出了若干新颖的编码强度与潜在高编码估计方法,包括对不可靠报告情况的考量。我们利用美国国立卫生研究院"全民研究计划"数据在模拟数据中验证了估计器的性能,并开发了开源的R软件包以模拟现实标注的高编码数据——这类数据此前尚未公开。