We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, ReLU-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a rank-flow tie-breaker then selects front-loaded schedules, substantially reducing held-out test loss at no extra computational cost, with accuracy gains as a consistent secondary effect. We test the predictions in MLPs and Vision Transformers and discuss CNN/ResNet extensions.
翻译:我们发展了丢失作为混沌边缘临界信号传播扰动的平均场理论。丢失改变了完美对齐不动点,使得信息传播深度尺度在临界初始化下仍为有限值。我们推导了相关性衰减的临界和交叉缩放定律,并证实光滑激活函数与带拐点的类ReLU激活函数属于不同普适类,两者具有不同临界指数,并在失谐度和丢失强度上呈现通用双参数缩放坍缩。该区别源于相关图的分析结构:光滑激活在完美对齐附近允许泰勒展开,而带拐点激活会形成具有通用非解析性的分支点。作为推论,该框架在固定预算下生成饱和丢失曲线,并通过秩流连接选择器确定前置调度策略,在不增加计算成本的情况下显著降低留出测试损失,同时精度提升作为一致性次要效应。我们在多层感知机与视觉Transformer中验证了理论预测,并讨论了CNN/ResNet的扩展应用。