Generative deep learning techniques have demonstrated an impressive capacity for tackling biomolecular design problems in recent years. Despite their high performance, however, they still suffer from a lack of interpretability and rigorous quantification of associated search spaces, which are necessary to unlock their full potential for scientific inquiry beyond efficient design. An area in which they are of particular interest is in the design of antimicrobial peptides, which are a promising class of therapeutics to treat bacterial infections. Discovering and designing such peptides is difficult because of the vast number of possible sequences and comparatively small amount of experimental information. In this work, we perform a theoretical investigation of latent Bayesian optimization for searching through peptide sequence spaces, with a focus on antimicrobial peptides. We investigate (1) whether searching through a dimensionally-reduced variant of the latent design space may facilitate optimization, (2) how organizing latent spaces by differing amounts of more and less relevant information may improve the efficiency of arriving at an optimal peptide design, and (3) the interpretability of the spaces. We find that employing a dimensionally-reduced version of the latent space is more interpretable and can be advantageous, while the use of less-relevant but more easily-computable physicochemical properties is advantageous to latent space organization in certain contexts and the use of more-relevant but sparser properties associated with the latent Bayesian objective function is advantageous in others. This work lays crucial groundwork for biophysically-motivated peptide design procedures, with an especial focus on antimicrobial peptides.
翻译:近年来,生成式深度学习技术在解决生物分子设计问题方面展现出令人瞩目的能力。然而,尽管其性能卓越,这类方法仍面临可解释性不足以及相关搜索空间缺乏严格量化的问题——而这两者正是充分释放其科学探索潜力(超越高效设计范畴)的必要条件。在抗菌肽设计中,此类技术尤其受到关注——抗菌肽作为治疗细菌感染的一类前景广阔的候选药物,其发现与设计因海量可能的序列与相对稀少的实验信息而异常困难。本研究针对潜在贝叶斯优化方法在肽序列空间搜索中的理论机制进行深入探究,重点聚焦抗菌肽。我们系统考察了三个核心问题:(1) 在降维后的潜在设计空间中进行搜索是否更有利于优化过程;(2) 通过按相关信息与弱相关信息的差异程度组织潜在空间,能否提升达成最优肽设计的效率;(3) 潜在空间的可解释性特征。研究结果表明:采用降维版本的潜在空间不仅可解释性更强,且能带来优化优势;在潜在空间组织方面,利用相关性较弱但计算便捷的物理化学性质在特定场景下更具优势,而利用与潜在贝叶斯目标函数相关的强相关性但稀疏性特征则在其他场景中更具优势。本研究为基于生物物理机制的肽设计方法奠定了关键基础,尤其对推动抗菌肽开发具有重要参考价值。