Motivation: Developing high-performing bioinformatics models typically requires repeated cycles of hypothesis formulation, architectural redesign, and empirical validation, making progress slow, labor-intensive, and difficult to reproduce. Although recent LLM-based assistants can automate isolated steps, they lack performance-grounded reasoning and stability-aware mechanisms required for reliable, iterative model improvement in bioinformatics workflows. Results: We introduce MARBLE, an execution-stable autonomous model refinement framework for bioinformatics models. MARBLE couples literature-aware reference selection with structured, debate-driven architectural reasoning among role-specialized agents, followed by autonomous execution, evaluation, and memory updates explicitly grounded in empirical performance. Across spatial transcriptomics domain segmentation, drug-target interaction prediction, and drug response prediction, MARBLE consistently achieves sustained performance improvements over strong baselines across multiple refinement cycles, while maintaining high execution robustness and low regression rates. Framework-level analyses demonstrate that structured debate, balanced evidence selection, and performance-grounded memory are critical for stable, repeatable model evolution, rather than single-run or brittle gains. Availability: Source code, data and Supplementary Information are available at https://github.com/PRISM-DGU/MARBLE.
翻译:研究动机:开发高性能生物信息学模型通常需要经历假设构建、架构重设计和实证验证的重复循环,导致进展缓慢、劳动密集且难以复现。尽管近期基于大语言模型的辅助工具能够自动化独立步骤,但它们缺乏生物信息学工作流中实现可靠、迭代式模型改进所需的性能驱动推理与稳定性感知机制。研究结果:本文提出MARBLE,一种面向生物信息学模型的执行稳定性自主模型优化框架。MARBLE将文献感知的参考选择与角色专业化智能体间的结构化辩论驱动架构推理相结合,随后执行基于实证性能显式锚定的自主执行、评估与记忆更新。在空间转录组学域分割、药物-靶点相互作用预测及药物反应预测三大任务中,MARBLE在多个优化周期内持续超越强基线模型,同时保持高执行鲁棒性与低回归率。框架级分析表明,结构化辩论、均衡证据选择及性能锚定记忆是实现稳定可复现模型进化的关键要素,而非依赖单次运行或脆弱增益。资源获取:源代码、数据及补充信息可通过 https://github.com/PRISM-DGU/MARBLE 获取。