While it is established that neural networks suffer from catastrophic forgetting ``at the output level'', it is debated whether this is also the case at the level of representations. Some studies ascribe a certain level of innate robustness to representations, that they only forget minimally and no critical information, while others claim that representations are also severely affected by forgetting. To settle this debate, we first discuss how this apparent disagreement might stem from the coexistence of two phenomena that affect the quality of continually learned representations: knowledge accumulation and feature forgetting. We then show that, even though it is true that feature forgetting can be small in absolute terms, newly learned information is forgotten just as catastrophically at the level of representations as it is at the output level. Next we show that this feature forgetting is problematic as it substantially slows down knowledge accumulation. We further show that representations that are continually learned through both supervised and self-supervised learning suffer from feature forgetting. Finally, we study how feature forgetting and knowledge accumulation are affected by different types of continual learning methods.
翻译:虽然神经网络在“输出层面”会发生灾难性遗忘已成共识,但在表征层面是否同样存在这一现象仍存争议。部分研究认为表征具有某种程度的固有鲁棒性,仅发生极少量且不涉及关键信息的遗忘,而另一些研究则指出表征同样受到遗忘的严重影响。为解答这一争议,我们首先讨论这种明显分歧可能源于两种同时影响持续学习表征质量的现象:知识积累与特征遗忘。随后我们证明,尽管特征遗忘的绝对幅度可能较小,但新学习信息在表征层面的遗忘速度与输出层面同样严重。接着我们表明这种特征遗忘会显著阻碍知识积累,构成实质性问题。我们进一步证明,通过监督学习和自监督学习持续获得的表征均存在特征遗忘问题。最后,我们研究了不同持续学习方法对特征遗忘和知识积累的影响。