Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbones, or heuristic rules. In this work, we propose Reinforcement Patching (ReinPatch), the first framework to jointly optimize a sequence patching policy and its downstream sequence backbone model using reinforcement learning. By formulating patch boundary placement as a discrete decision process optimized via Group Relative Policy Gradient (GRPG), ReinPatch bypasses the need for continuous relaxations and performs dynamic patching policy optimization in a natural manner. Moreover, our method allows strict enforcement of a desired compression rate, freeing the downstream backbone to scale efficiently, and naturally supports multi-level hierarchical modeling. We evaluate ReinPatch on time-series forecasting datasets, where it demonstrates compelling performance compared to state-of-the-art data-driven patching strategies. Furthermore, our detached design allows the patching module to be extracted as a standalone foundation patcher, providing the community with visual and empirical insights into the segmentation behaviors preferred by a purely performance-driven neural patching strategy.
翻译:在现代深度学习模型中,高效聚合空间或时间范围以获取紧凑表征已成为统一原则,然而针对长序列数据(尤其是连续序列如时间序列)学习数据自适应表征仍是一项开放挑战。尽管固定尺寸补丁方法提升了可扩展性与性能,但以端到端方式发现可变尺寸、数据驱动的补丁往往迫使模型依赖软离散化、特定主干网络或启发式规则。本文提出强化补丁(ReinPatch),这是首个利用强化学习联合优化序列补丁策略及其下游序列主干模型的框架。通过将补丁边界定位形式化为基于分组相对策略梯度(GRPG)优化的离散决策过程,ReinPatch摆脱了对连续松弛方法的依赖,以自然方式实现动态补丁策略优化。此外,本方法可严格强制执行目标压缩率,使下游主干模型高效扩展,并天然支持多层级层次建模。我们在时间序列预测数据集上评估了ReinPatch,结果表明其相较于最先进的数据驱动补丁策略具有显著性能优势。更重要的是,由于解耦设计,补丁模块可独立提取为通用补丁器,为学界提供纯粹性能驱动的神经补丁策略偏好分割行为的可视化与经验性洞见。