A Study of Forward-Forward Algorithm for Self-Supervised Learning

Self-supervised representation learning has seen remarkable progress in the last few years, with some of the recent methods being able to learn useful image representations without labels. These methods are trained using backpropagation, the de facto standard. Recently, Geoffrey Hinton proposed the forward-forward algorithm as an alternative training method. It utilizes two forward passes and a separate loss function for each layer to train the network without backpropagation. In this study, for the first time, we study the performance of forward-forward vs. backpropagation for self-supervised representation learning and provide insights into the learned representation spaces. Our benchmark employs four standard datasets, namely MNIST, F-MNIST, SVHN and CIFAR-10, and three commonly used self-supervised representation learning techniques, namely rotation, flip and jigsaw. Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-)supervised training, the transfer performance is significantly lagging behind in all the studied settings. This may be caused by a combination of factors, including having a loss function for each layer and the way the supervised training is realized in the forward-forward paradigm. In comparison to backpropagation, the forward-forward algorithm focuses more on the boundaries and drops part of the information unnecessary for making decisions which harms the representation learning goal. Further investigation and research are necessary to stabilize the forward-forward strategy for self-supervised learning, to work beyond the datasets and configurations demonstrated by Geoffrey Hinton.

翻译：自监督表示学习在过去几年中取得了显著进展，一些最新方法能够无需标签即可学习有用的图像表示。这些方法采用反向传播作为事实标准进行训练。近期，杰弗里·辛顿提出前向-前向算法作为一种替代训练方法，通过两次前向传递和每层独立损失函数实现无需反向传播的网络训练。本研究首次系统比较了前向-前向算法与反向传播在自监督表示学习中的性能，并深入剖析了所学习表示空间的特征。我们以四个标准数据集（MNIST、F-MNIST、SVHN和CIFAR-10）为基准，采用三种常用自监督表示学习技术（旋转、翻转和拼图）。主要发现表明：尽管前向-前向算法在（自）监督训练阶段与反向传播性能相当，但其迁移性能在所有实验设定中均显著落后。这一现象可能源于多因素耦合，包括逐层损失函数设计及前向-前向范式对监督训练的实现方式。相较于反向传播，前向-前向算法更关注决策边界而舍弃部分非必要信息，这损害了表示学习目标。要突破杰弗里·辛顿所展示的数据集与配置限制，使前向-前向策略稳定应用于自监督学习，仍需进一步深入研究。