Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and their effects on performance, often relying on ad-hoc, manual workflows that are inherently slow and error-prone. We introduce EvalBlocks, a modular, plug-and-play framework for efficient evaluation of foundation models during development. Built on Snakemake, EvalBlocks supports seamless integration of new datasets, foundation models, aggregation methods, and evaluation strategies. All experiments and results are tracked centrally and are reproducible with a single command, while efficient caching and parallel execution enable scalable use on shared compute infrastructure. Demonstrated on five state-of-the-art foundation models and three medical imaging classification tasks, EvalBlocks streamlines model evaluation, enabling researchers to iterate faster and focus on model innovation rather than evaluation logistics. The framework is released as open source software at https://github.com/DIAGNijmegen/eval-blocks.
翻译:在医学影像领域开发基础模型需要对下游性能进行持续监控。研究人员通常需要追踪大量实验、设计选择及其对性能的影响,这一过程往往依赖于临时搭建的手动工作流程,本质上效率低下且容易出错。本文介绍EvalBlocks——一个用于开发过程中高效评估基础模型的模块化即插即用框架。该框架基于Snakemake构建,支持新数据集、基础模型、聚合方法和评估策略的无缝集成。所有实验与结果均通过中心化方式追踪,且可通过单一命令实现完全复现,同时高效缓存与并行执行机制使其能在共享计算基础设施上实现规模化应用。通过在五种前沿基础模型和三项医学影像分类任务上的实证展示,EvalBlocks显著优化了模型评估流程,使研究人员能够加速迭代周期,将精力集中于模型创新而非评估事务。本框架已在https://github.com/DIAGNijmegen/eval-blocks开源发布。