Predictive coding-inspired deep networks for visual computing integrate classification and reconstruction processes in shared intermediate layers. Although synergy between these processes is commonly assumed, it has yet to be convincingly demonstrated. In this study, we take a critical look at how classifying and reconstructing interact in deep learning architectures. Our approach utilizes a purposefully designed family of model architectures reminiscent of autoencoders, each equipped with an encoder, a decoder, and a classification head featuring varying modules and complexities. We meticulously analyze the extent to which classification- and reconstruction-driven information can seamlessly coexist within the shared latent layer of the model architectures. Our findings underscore a significant challenge: Classification-driven information diminishes reconstruction-driven information in intermediate layers' shared representations and vice versa. While expanding the shared representation's dimensions or increasing the network's complexity can alleviate this trade-off effect, our results challenge prevailing assumptions in predictive coding and offer guidance for future iterations of predictive coding concepts in deep networks.
翻译:受预测编码启发的视觉计算深度网络在共享中间层中整合了分类与重建过程。尽管这两个过程通常被认为具有协同作用,但尚未得到令人信服的实证支持。本研究批判性地审视了分类与重建在深度学习架构中的交互机制。我们采用了一种专门设计的类自编码器模型架构族,每个架构均配备编码器、解码器及包含不同模块与复杂度的分类头,并细致分析了分类驱动与重建驱动的信息在模型共享潜在层中共存的可能性。研究结果揭示了一个显著挑战:分类驱动信息会削弱中间层共享表征中的重建驱动信息,反之亦然。尽管扩展共享表征维度或增加网络复杂度可缓解这种权衡效应,但我们的研究结果对预测编码的现有假设提出了挑战,并为深度网络中预测编码概念的后续迭代提供了指导方向。