The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.
翻译:机器学习中推理的重要性已导致机器学习领域,特别是深度学习领域,涌现出大量不同的提案。为了降低卷积神经网络的复杂性,我们提出了一种受Volterra滤波器启发的网络架构。该架构通过数据延迟输入样本之间的交互引入受控非线性。我们提出了一种级联实现的Volterra滤波方法,以显著减少执行与传统神经网络相同分类任务所需的参数数量。我们展示了这种Volterra神经网络(VNN)的高效并行实现,以及其在保持相对简单且更易处理结构的同时所获得的卓越性能。此外,我们展示了一种相当复杂的网络自适应方法,用于非线性融合视频序列的RGB(空间)信息和光流(时间)信息,以实现动作识别。所提出的方法在UCF-101和HMDB-51数据集上进行了动作识别评估,结果表明其性能优于最先进的CNN方法。