This work proposes neural training as a \emph{process tensor}: a multi-time map that takes a sequence of controllable instruments (batch choices, augmentations, optimizer micro-steps) and returns an observable of the trained model. Building on this operational lens, we introduce a simple, model-agnostic witness of training memory based on \emph{back-flow of distinguishability}. In a controlled two-step protocol, we compare outcome distributions after one intervention versus two; the increase $Δ_{\mathrm{BF}} = D_2 - D_1>0$ (with $D\in\{\mathrm{TV}, \mathrm{JS}, \mathrm{H}\}$ measured on softmax predictions over a fixed probe set) certifies non-Markovianity. We observe consistent positive back-flow with tight bootstrap confidence intervals, amplification under higher momentum, larger batch overlap, and more micro-steps, and collapse under a \emph{causal break} (resetting optimizer state), directly attributing the effect to optimizer/data-state memory. The witness is robust across TV/JS/Hellinger, inexpensive to compute, and requires no architectural changes. We position this as a \emph{measurement} contribution: a principled diagnostic and empirical evidence that practical SGD deviates from the Markov idealization. An exploratory case study illustrates how the micro-level signal can inform curriculum orderings. "Data order matters" turns into a testable operator with confidence bounds, our framework offers a common stage to compare optimizers, curricula, and schedules through their induced training memory.
翻译:本研究将神经训练建模为一种\emph{过程张量}:一种多时间映射,它以可控操作序列(批次选择、数据增强、优化器微步)为输入,并输出训练后模型的可观测指标。基于此操作视角,我们提出了一种基于\emph{可区分性回流}的简单、模型无关的训练记忆检测方法。在受控的两步协议中,我们比较单次干预与两次干预后的输出分布;当在固定探测集上通过softmax预测测得的$Δ_{\mathrm{BF}} = D_2 - D_1>0$(其中$D\in\{\mathrm{TV}, \mathrm{JS}, \mathrm{H}\}$)时,即可证明非马尔可夫性的存在。我们观察到持续的正向回流现象,其自举置信区间紧凑,且在较高动量、较大批次重叠和更多微步数条件下会增强,而在\emph{因果中断}(重置优化器状态)条件下会消失,这直接证明了该效应源于优化器/数据状态记忆。该检测方法对TV/JS/Hellinger距离度量均具有鲁棒性,计算成本低廉,且无需改变模型架构。我们将此工作定位为一种\emph{测量}贡献:它提供了原则性诊断工具和实证证据,表明实际SGD训练过程偏离了马尔可夫理想化假设。一项探索性案例研究展示了微观层面的信号如何指导课程排序设计。"数据顺序至关重要"这一命题由此转化为具有置信边界的可检验算子,我们的框架为通过训练记忆比较优化器、课程策略与调度方案提供了统一的分析平台。