Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.
翻译:与采用经典自注意力机制的常规"黑箱"Transformer不同,我们通过展开基于混合图的优化算法构建了轻量化且可解释的类Transformer神经网络,用于预测具有时空维度的交通数据。我们构建了两个图:无向图$\mathcal{G}^u$捕捉地理空间相关性,有向图$\mathcal{G}^d$捕捉时间序列关系。我们预测信号$\mathbf{x}$的未来样本,假设该信号相对于$\mathcal{G}^u$和$\mathcal{G}^d$均具有"平滑性",为此设计了新的$\ell_2$和$\ell_1$范数变分项来量化并促进有向图上的信号平滑性(低频重构)。基于交替方向乘子法(ADMM)设计了迭代算法,并将其展开为前馈网络以实现数据驱动的参数学习。周期性地插入起自注意力作用的$\mathcal{G}^u$和$\mathcal{G}^d$图学习模块。实验表明,本展开网络在实现与最先进预测方案相媲美的交通预测性能的同时,大幅减少了参数量。