Developing new drugs for target diseases is a time-consuming and expensive task, drug repurposing has become a popular topic in the drug development field. As much health claim data become available, many studies have been conducted on the data. The real-world data is noisy, sparse, and has many confounding factors. In addition, many studies have shown that drugs effects are heterogeneous among the population. Lots of advanced machine learning models about estimating heterogeneous treatment effects (HTE) have emerged in recent years, and have been applied to in econometrics and machine learning communities. These studies acknowledge medicine and drug development as the main application area, but there has been limited translational research from the HTE methodology to drug development. We aim to introduce the HTE methodology to the healthcare area and provide feasibility consideration when translating the methodology with benchmark experiments on healthcare administrative claim data. Also, we want to use benchmark experiments to show how to interpret and evaluate the model when it is applied to healthcare research. By introducing the recent HTE techniques to a broad readership in biomedical informatics communities, we expect to promote the wide adoption of causal inference using machine learning. We also expect to provide the feasibility of HTE for personalized drug effectiveness.
翻译:针对靶向疾病的新药研发是一项耗时且昂贵的任务,因此药物重定位已成为药物开发领域的热点话题。随着大量健康索赔数据的可用性提升,许多基于此类数据的研究应运而生。真实世界数据具有噪声大、稀疏性强且包含众多混杂因素的特点。此外,多项研究表明药物效果在人群中存在异质性。近年来,大量用于估计异质性治疗效果(HTE)的先进机器学习模型涌现,并已在计量经济学与机器学习社区得到应用。这些研究承认医药与药物开发是其主要应用领域,但从HTE方法论到药物开发的转化研究仍十分有限。我们旨在将HTE方法论引入医疗健康领域,并基于医疗健康行政索赔数据的基准实验,提出该方法论转化过程中的可行性考量。同时,我们希望通过基准实验展示将该模型应用于医疗健康研究时的解释与评估方法。通过向生物医学信息学领域的广大读者介绍最新的HTE技术,我们期望推动因果推断在机器学习中的广泛应用,并进一步为个性化药物有效性提供HTE的可行性验证。