A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin-proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. As the field evolves, it becomes increasingly apparent that the traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we explore the impact of ML on de novo PROTAC design $-$ an aspect of molecular design that has not been comprehensively reviewed despite its significance. We delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for researchers in their pursuit of better design strategies for this new modality.

翻译：靶向蛋白降解（TPD）是现代药物发现中快速发展的领域，其旨在通过利用细胞固有的降解通路选择性靶向并降解疾病相关蛋白，从而调控细胞内蛋白质水平。该策略为那些基于占据机制的抑制剂未能取得成功的治疗场景创造了新的干预机会。蛋白水解靶向嵌合体（PROTAC）是TPD策略的核心，其利用泛素-蛋白酶体系统实现对致病蛋白的选择性靶向与蛋白酶体降解。随着该领域的发展，传统设计此类复杂分子的方法局限性日益凸显。这促使机器学习（ML）与生成模型被用于改进和加速开发进程。本文综述探讨了ML对从头设计PROTAC的影响——尽管该分子设计方向意义重大，却尚未得到系统性的评述。我们深入分析了PROTAC连接子设计的独特性质，强调构建能够实现TPD的有效双功能分子所需的复杂性。随后，我们考察了在小分子药物发现领域成熟的基于片段的药物设计（FBDD）中的ML方法，如何为PROTAC连接子设计开辟道路。本综述对该方法应用于复杂的PROTAC开发领域所固有的局限性进行了批判性评估。此外，我们回顾了现有应用于PROTAC设计的ML研究工作，重点介绍了开创性成果，并尤其指出了这些研究面临的局限。通过深入剖析PROTAC开发现状以及ML在PROTAC设计中的关键作用，我们旨在为研究人员追求这一新模式的更优设计策略提供有价值的视角。