We introduce here a predictive coding based model that aims to generate accurate and sharp future frames. Inspired by the predictive coding hypothesis and related works, the total model is updated through a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. Most importantly, We propose and improve several artifacts to ensure that the neural networks generate clear and natural frames. Different inputs are no longer simply concatenated or added, they are calculated in a modulated manner to avoid being roughly fused. The downsampling and upsampling modules have been redesigned to ensure that the network can more easily construct images from Fourier features of low-frequency inputs. Additionally, the training strategies are also explored and improved to generate believable results and alleviate inconsistency between the input predicted frames and ground truth. Our proposals achieve results that better balance pixel accuracy and visualization effect.
翻译:我们在此介绍一种基于预测编码的模型,旨在生成准确且清晰的未来帧。受预测编码假说及相关工作的启发,整体模型通过自下而上与自上而下信息流的结合进行更新,从而增强不同网络层级之间的交互。最重要的是,我们提出并改进了若干伪影抑制技术,以确保神经网络生成清晰自然的帧。不同输入不再简单地拼接或相加,而是以调制方式进行计算,以避免粗糙融合。下采样与上采样模块经过重新设计,确保网络能够更轻松地从低频输入的傅里叶特征中构建图像。此外,训练策略也得到探索与改进,以生成可靠的预测结果,并缓解输入预测帧与真实值之间的不一致性。我们的方案在像素精度与可视化效果之间实现了更好的平衡。