Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform one. Local heterogeneity alone is not enough: under pointwise forecasting losses, a complex-looking region is not automatically one where finer patching reduces the loss. We model patching as a budgeted bitrate allocation and derive an explicit threshold that a dynamic patching rule must satisfy to beat a well-tuned uniform baseline, then bound the achievable improvement both locally (a quadratic surrogate) and globally (a strong-convexity bound under the model's assumptions). Two structural results follow: without a coupling constraint, scalar local complexity cannot produce a non-uniform optimum under a common loss landscape; and once the backbone is trained to its representation-aware optimum, the alignment gain collapses around a well-tuned uniform patch size. To test these predictions, we run a controlled isolation study on three representative architectures, replacing each adaptive mechanism with a uniform patch-size sweep while keeping the backbone, data, and training protocol fixed. On standard long-horizon forecasting benchmarks, the validation-selected uniform baseline is competitive with the dynamic counterpart, with per-setting effects concentrated near zero and no consistent directional advantage once results are aggregated by dataset. The larger gains we do observe are method- and dataset-specific. Adaptive patching should therefore be evaluated against a tuned uniform baseline; its value depends on whether a cheap and reliable routing signal can identify where finer patches actually reduce forecasting loss.

翻译：自适应分块是近期时间序列Transformer中一个引人注目的提议：在序列局部信息丰富处分配更细粒度的分块。本文探究在何种条件下，内容自适应分块算子应优于经调优的均匀分块。仅凭局部异质性并不足够：在逐点预测损失下，复杂区域并不自动意味着更细粒度分块能降低损失。我们将分块建模为有预算的比特率分配，推导出动态分块规则必须满足的显式阈值以超越调优后的均匀基线，然后分别从局部（二次替代）和全局（模型假设下的强凸性边界）角度界定可实现的改进范围。由此得出两个结构性结论：在无耦合约束条件下，基于标量局部复杂度无法在常见损失景观下产生非均匀最优解；一旦主干网络训练至表征感知最优状态，对齐增益会围绕调优后的均匀分块尺寸急剧衰减。为验证这些预测，我们在三种代表性架构上开展受控隔离实验：保持主干网络、数据和训练协议不变，用均匀分块尺寸扫描替代各自适应机制。在标准长期预测基准上，经验证集筛选的均匀基线可与动态分块相抗衡，其逐设置效应集中于零附近，按数据集汇总结果后未呈现一致方向性优势。我们所观察到的较大增益均具有方法和数据集特异性。因此，自适应分块应基于调优后的均匀基线进行评估；其价值取决于廉价可靠的路径选择信号能否识别出更细粒度分块实际降低预测损失的场景。