Neural Architecture Search (NAS) deployment in industrial production systems faces a fundamental validation bottleneck: verifying a single candidate architecture pi requires evaluating the deployed ensemble of M models, incurring prohibitive O(M) computational cost per candidate. This cost barrier severely limits architecture iteration frequency in real-world applications where ensembles (M=50-200) are standard for robustness. This work introduces Ensemble-Decoupled Architecture Search, a framework that leverages ensemble theory to predict system-level performance from single-learner evaluation. We establish the Ensemble-Decoupled Theory with a sufficient condition for monotonic ensemble improvement under homogeneity assumptions: a candidate architecture pi yields lower ensemble error than the current baseline if rho(pi) < rho(pi_old) - (M / (M - 1)) * (Delta E(pi) / sigma^2(pi)), where Delta E, rho, and sigma^2 are estimable from lightweight dual-learner training. This decouples architecture search from full ensemble training, reducing per-candidate search cost from O(M) to O(1) while maintaining O(M) deployment cost only for validated winners. We unify solution strategies across pipeline continuity: (1) closed-form optimization for tractable continuous pi (exemplified by feature bagging in CTR prediction), (2) constrained differentiable optimization for intractable continuous pi, and (3) LLM-driven search with iterative monotonic acceptance for discrete pi. The framework reveals two orthogonal improvement mechanisms -- base diversity gain and accuracy gain -- providing actionable design principles for industrial-scale NAS. All theoretical derivations are rigorous with detailed proofs deferred to the appendix. Comprehensive empirical validation will be included in the journal extension of this work.
翻译:神经架构搜索(NAS)在工业生产系统部署中面临根本性的验证瓶颈:验证单个候选架构π需要评估已部署的M个模型的集成,导致每个候选架构产生O(M)的计算成本,这在实践中难以承受。在需要50-200个模型集成以保证鲁棒性的实际应用中,该成本壁垒严重限制了架构迭代频率。本文提出集成解耦架构搜索框架,该框架利用集成理论通过单学习器评估预测系统级性能。我们建立了集成解耦理论,并提供充分条件确保在同质性假设下集成性能单调提升:当ρ(π) < ρ(π_old) - (M/(M-1))*(ΔE(π)/σ²(π))时,候选架构π产生的集成误差低于当前基线,其中ΔE、ρ和σ²可通过轻量级双学习器训练估计。该方法将架构搜索与完整集成训练解耦,将每个候选架构的搜索成本从O(M)降至O(1),而仅对验证通过的优胜者保持O(M)的部署成本。我们通过管道连续性统一解决方案策略:(1) 对可处理连续π采用闭式优化(以CTR预测中的特征装袋为例),(2) 对不可处理连续π采用约束可微优化,(3) 对离散π采用迭代单调接受驱动的LLM搜索。该框架揭示两种正交的改进机制——基学习器多样性增益与准确率增益,为工业规模NAS提供可操作的原理。所有理论推导均经过严格证明,详细证明过程见附录。全面的实验验证将在本文的期刊扩展版本中呈现。