Intrinsically disordered regions (IDRs) play central roles in cellular function, yet remain poorly evaluated by existing protein structure prediction benchmarks. Current evaluations largely focus on well-folded domains, overlooking three fundamental challenges in realistic biological settings: the structural complexity of proteins, the resulting low availability of reliable ground truth, and prediction uncertainty that can propagate into high-risk downstream failures, such as in drug discovery, protein-protein interaction modeling, and functional annotation. We present DisProtBench, an IDR-centric benchmark that explicitly incorporates prediction uncertainty into the evaluation of protein structure prediction models (PSPMs). To address structural complexity and ground-truth scarcity, we curate and unify a large-scale, multi-modal dataset spanning disease-relevant IDRs, GPCR-ligand interactions, and multimeric protein complexes. To assess predictive uncertainty, we introduce Functional Uncertainty Sensitivity (FUS), a novel prediction uncertainty-stratified metric that quantifies downstream task performance under prediction uncertainty. Using this benchmark, we conduct a systematic evaluation of state-of-the-art PSPMs and reveal clear, task-dependent failure modes. Protein-protein interaction prediction degrades sharply in IDRs, while structure-based drug discovery remains comparatively robust. These effects are largely invisible to standard global accuracy metrics, which overestimate functional reliability under prediction uncertainty. We have open-sourced our benchmark and the codebase at https://github.com/Susan571/DisProtBench.
翻译:固有无序区域(IDRs)在细胞功能中扮演核心角色,但在现有蛋白质结构预测基准测试中仍缺乏充分评估。当前的评估主要聚焦于良好折叠的结构域,忽视了真实生物环境中的三个基本挑战:蛋白质的结构复杂性、由此导致的可靠真实数据稀缺性,以及可能引发高风险下游应用失败的预测不确定性,例如在药物发现、蛋白质-蛋白质相互作用建模和功能注释中。我们提出了DisProtBench,一个以IDR为中心的基准测试,明确将预测不确定性纳入蛋白质结构预测模型(PSPMs)的评估中。为应对结构复杂性和真实数据稀缺性,我们整理并整合了一个大规模、多模态数据集,涵盖疾病相关IDRs、GPCR-配体相互作用以及多聚体蛋白质复合物。为评估预测不确定性,我们引入了功能不确定性敏感度(FUS),这是一种新颖的基于预测不确定性分层的度量指标,用于量化预测不确定性下的下游任务性能。利用该基准测试,我们对当前最先进的PSPMs进行了系统评估,并揭示了明确的任务依赖性失效模式。蛋白质-蛋白质相互作用预测在IDRs中急剧退化,而基于结构的药物发现则相对稳健。这些效应在标准的全局精度指标中基本不可见,这些指标高估了预测不确定性下的功能可靠性。我们已在https://github.com/Susan571/DisProtBench开源了我们的基准测试和代码库。