Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.
翻译:不同于采用经典自注意力机制的"黑盒"Transformer,我们通过展开一种基于混合图的优化算法,构建了一个轻量且可解释的类Transformer神经网络,用于时空维度的交通预测。我们构建了两个图:一个无向图$\mathcal{G}^u$用于捕捉地理空间相关性,一个有向图$\mathcal{G}^d$用于捕捉时间序列关系。我们假设信号$\mathbf{x}$在$\mathcal{G}^u$和$\mathcal{G}^d$上均具有"平滑性",并据此预测其未来样本;为此我们设计了新的$\ell_2$范数和$\ell_1$范数变分项,以量化并促进信号在有向图上的平滑性(低频重建)。我们基于交替方向乘子法(ADMM)设计了一种迭代算法,并将其展开为前馈网络以进行数据驱动的参数学习。我们周期性地插入用于学习$\mathcal{G}^u$和$\mathcal{G}^d$的图学习模块,这些模块承担了自注意力的功能。实验表明,我们展开的网络在交通预测性能上达到了与最先进预测方案相当的水平,同时大幅减少了参数量。