Inference in large-scale AI models is typically performed on dense parameter matrices, leading to inference cost and system complexity that scale unsustainably with model size. This limitation does not arise from insufficient model capacity, but from treating post-training inference systems as monolithic operators while ignoring internal structures formed during learning. We show that gradient update events in large models are highly localized and selective, leaving many parameter dependencies statistically indistinguishable from their initialization distribution after training. As a result, post-training inference systems are structurally non-uniform and inherently decomposable. Based on this observation, we introduce a post-training statistical criterion and a structural annealing procedure that removes unsupported dependencies and reveals stable, independent substructures. This work establishes a post-training, model-agnostic structural view of inference systems and enables structured, parallel inference without modifying model functionality or interfaces.
翻译:大规模AI模型的推理通常在稠密参数矩阵上进行,导致推理成本和系统复杂度随模型规模呈不可持续增长。这种局限并非源于模型容量不足,而是由于将训练后推理系统视为整体算子,却忽略了学习过程中形成的内部结构。我们证明大型模型中的梯度更新事件具有高度局部性和选择性,使得大量参数依赖关系在训练后与其初始化分布在统计上无法区分。因此,训练后推理系统在结构上呈现非均匀性,并天然具备可分解特性。基于此发现,我们提出一种训练后统计准则和结构退火流程,通过消除无统计支持的依赖关系,揭示稳定且独立的子结构。本研究建立了训练后推理系统的模型无关结构视角,实现了不改变模型功能或接口的结构化并行推理。