Predicting protein secondary structure is essential for understanding protein function and advancing drug discovery. However, the intricate sequence-structure relationship poses significant challenges for accurate modeling. To address these, we propose MOGP-MMF, a multi-objective genetic programming framework that reformulates PSSP as an automated optimization task focused on feature selection and fusion. Specifically, MOGP-MMF introduces a multi-view multi-level representation strategy that integrates evolutionary, semantic, and newly introduced structural views to capture the comprehensive protein folding logic. Leveraging an enriched operator set, the framework evolves both linear and nonlinear fusion functions, effectively capturing high-order feature interactions while reducing fusion complexity. To resolve the accuracy-complexity trade-off, an improved multi-objective GP algorithm is developed, incorporating a knowledge transfer mechanism that utilizes prior evolutionary experience to guide the population toward global optima. Extensive experiments across seven benchmark datasets demonstrate that MOGP-MMF surpasses state-of-the-art methods, particularly in Q8 accuracy and structural integrity. Furthermore, MOGP-MMF generates a diverse set of non-dominated solutions, offering flexible model selection schemes for various practical application scenarios. The source code is available on GitHub: https://github.com/qian-ann/MOGP-MMF/tree/main.
翻译:蛋白质二级结构预测对于理解蛋白质功能和推动药物发现至关重要。然而,复杂的序列-结构关系给精确建模带来了巨大挑战。为解决这些问题,我们提出MOGP-MMF,一个多目标遗传规划框架,该框架将蛋白质二级结构预测重新定义为一项专注于特征选择与融合的自动化优化任务。具体而言,MOGP-MMF引入了一种多视角多层次表示策略,整合进化、语义及新引入的结构视角,以捕捉全面的蛋白质折叠逻辑。该框架利用富化的运算符集,演化线性和非线性融合函数,在降低融合复杂性的同时有效捕获高阶特征交互。为解决精度与复杂性的权衡问题,我们开发了一种改进的多目标遗传规划算法,该算法融入知识迁移机制,利用先前的进化经验引导种群趋向全局最优。在七个基准数据集上的广泛实验表明,MOGP-MMF超越了现有最优方法,尤其在Q8准确率和结构完整性方面表现突出。此外,MOGP-MMF生成一组多样化的非支配解,为各种实际应用场景提供了灵活的模型选择方案。源代码可在GitHub上获取:https://github.com/qian-ann/MOGP-MMF/tree/main。