StereoFactory: A Unified Merging Framework for Robust Stereo Matching

Stereo matching has advanced through foundation models trained on large-scale datasets, yet this paradigm suffers from a scalability bottleneck: incorporating new data requires costly joint retraining. Model merging offers a scalable post-hoc alternative by integrating knowledge from specialized models after source checkpoints are available. However, existing merging methods typically retain all available models or rely on greedy inclusion, which can preserve harmful task-vector interference. We propose StereoFactory, a coarse-to-fine evolutionary framework for adaptive model merging. Stage~1 employs a genetic algorithm to search the combinatorial space of model subsets, determining which models should participate. Stage~2 addresses module-level knowledge specialization (different functional modules exhibit distinct preferences for knowledge sources) through CMA-ES optimization of architecture-adaptive routing over the selected task vectors, with optional module-level scaling. Experiments across two architectures and four benchmarks demonstrate that StereoFactory consistently achieves the best four-benchmark average under the same checkpoint pool, reducing the average error from 3.80 to 3.30 on NMRF and from 2.88 to 2.19 on FoundationStereo relative to the strongest controlled baseline. The post-hoc search requires only 2.7--3.7\% of the corresponding joint-retraining wall-clock time. Analysis reveals that knowledge contributions are inherently module-specific, and selected subsets can transfer across architectures with minimal degradation. Code will be publicly released upon acceptance at: https://github.com/XiandaGuo/StereoFactory.

翻译：立体匹配已通过在大规模数据集上训练的基础模型取得了进展，然而这一范式面临可扩展性瓶颈：整合新数据需要成本高昂的联合重训练。模型融合提供了一种可扩展的事后替代方案，即在源检查点可用后，通过整合来自专门模型的知识实现。然而，现有的融合方法通常保留所有可用模型或依赖贪婪包含，这可能保留有害的任务向量干扰。我们提出StereoFactory，一种用于自适应模型融合的从粗到细的进化框架。第一阶段采用遗传算法搜索模型子集的组合空间，确定哪些模型应参与融合。第二阶段通过CMA-ES优化架构自适应路由（对所选任务向量进行选取，并可选模块级缩放）来解决模块级知识专门化问题（不同功能模块对知识源表现出不同偏好）。在两个架构和四个基准上的实验表明，在相同检查点池下，StereoFactory一致实现最佳四基准平均值，相较于最强受控基线，将NMRF的平均误差从3.80降至3.30，FoundationStereo的平均误差从2.88降至2.19。该事后搜索仅需相应联合重训练墙钟时间的2.7–3.7%。分析表明，知识贡献本质上是模块特定的，所选子集可跨架构迁移且性能下降极小。代码将在录用后公开发布于：https://github.com/XiandaGuo/StereoFactory。