Recent advancements in machine learning (ML) are transforming the field of structural biology. For example, AlphaFold, a groundbreaking neural network for protein structure prediction, has been widely adopted by researchers. The availability of easy-to-use interfaces and interpretable outcomes from the neural network architecture, such as the confidence scores used to color the predicted structures, have made AlphaFold accessible even to non-ML experts. In this paper, we present various methods for representing protein 3D structures from low- to high-resolution, and show how interpretable ML methods can support tasks such as predicting protein structures, protein function, and protein-protein interactions. This survey also emphasizes the significance of interpreting and visualizing ML-based inference for structure-based protein representations that enhance interpretability and knowledge discovery. Developing such interpretable approaches promises to further accelerate fields including drug development and protein design.
翻译:机器学习(ML)的最新进展正在改变结构生物学领域。例如,AlphaFold——一种用于蛋白质结构预测的突破性神经网络——已被研究人员广泛采用。该神经网络架构提供的易用接口和可解释输出(如用于对预测结构着色的置信度分数),使得即使非ML专家也能使用AlphaFold。本文介绍了从低分辨率到高分辨率表示蛋白质三维结构的多种方法,并展示了可解释ML方法如何支持蛋白质结构预测、蛋白质功能预测及蛋白质-蛋白质相互作用预测等任务。本综述还强调了对基于结构的蛋白质表征进行ML推理的可解释化与可视化的重要性,这有助于提升可解释性与知识发现。开发此类可解释方法有望进一步加速药物研发和蛋白质设计等领域的发展。