We present Predictive Sparse Manifold Transform (PSMT), a minimalistic, interpretable and biologically plausible framework for learning and predicting natural dynamics. PSMT incorporates two layers where the first sparse coding layer represents the input sequence as sparse coefficients over an overcomplete dictionary and the second manifold learning layer learns a geometric embedding space that captures topological similarity and dynamic temporal linearity in sparse coefficients. We apply PSMT on a natural video dataset and evaluate the reconstruction performance with respect to contextual variability, the number of sparse coding basis functions and training samples. We then interpret the dynamic topological organization in the embedding space. We next utilize PSMT to predict future frames compared with two baseline methods with a static embedding space. We demonstrate that PSMT with a dynamic embedding space can achieve better prediction performance compared to static baselines. Our work establishes that PSMT is an efficient unsupervised generative framework for prediction of future visual stimuli.
翻译:我们提出预测性稀疏流形变换(Predictive Sparse Manifold Transform,PSMT),这是一个简约、可解释且具有生物学合理性的框架,用于学习和预测自然动态过程。PSMT包含两层结构:第一层为稀疏编码层,通过过完备字典将输入序列表示为稀疏系数;第二层为流形学习层,学习一个几何嵌入空间,以捕获稀疏系数中的拓扑相似性和动态时间线性关系。我们将PSMT应用于自然视频数据集,并评估其在上下文变异性、稀疏编码基函数数量及训练样本规模下的重建性能。随后,我们解释了嵌入空间中的动态拓扑组织。接着,利用PSMT预测未来帧,并与两种基于静态嵌入空间的基线方法进行比较。结果表明,具有动态嵌入空间的PSMT相比静态基线方法能够实现更优的预测性能。我们的工作证实了PSMT是一种高效的、用于预测未来视觉刺激的无监督生成式框架。