Catastrophic forgetting, the phenomenon in which a neural network loses previously obtained knowledge during the learning of new tasks, poses a significant challenge in continual learning. The Hard-Attention-to-the-Task (HAT) mechanism has shown potential in mitigating this problem, but its practical implementation has been complicated by issues of usability and compatibility, and a lack of support for existing network reuse. In this paper, we introduce HAT-CL, a user-friendly, PyTorch-compatible redesign of the HAT mechanism. HAT-CL not only automates gradient manipulation but also streamlines the transformation of PyTorch modules into HAT modules. It achieves this by providing a comprehensive suite of modules that can be seamlessly integrated into existing architectures. Additionally, HAT-CL offers ready-to-use HAT networks that are smoothly integrated with the TIMM library. Beyond the redesign and reimplementation of HAT, we also introduce novel mask manipulation techniques for HAT, which have consistently shown improvements across various experiments. Our work paves the way for a broader application of the HAT mechanism, opening up new possibilities in continual learning across diverse models and applications.
翻译:灾难性遗忘——即神经网络在学习新任务时丢失先前获取知识的现象——构成了持续学习中的重大挑战。硬注意力任务机制(HAT)在缓解该问题上展现出潜力,但其实际实现因可用性、兼容性问题以及缺乏对现有网络复用的支持而复杂化。本文提出HAT-CL,一种用户友好且兼容PyTorch的HAT机制重新设计方案。HAT-CL不仅实现了梯度操作的自动化,还简化了将PyTorch模块转换为HAT模块的过程:通过提供可直接集成至现有架构的完整模块套件实现上述功能。此外,HAT-CL提供与TIMM库无缝集成的即用型HAT网络。除HAT的重新设计与实现外,我们还针对HAT引入了新颖的掩码操作技术,该技术在多项实验中持续展现出性能提升。本研究为HAT机制的更广泛应用铺平道路,为跨不同模型与应用的持续学习开辟新可能性。