Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
翻译:许多现实世界的数据集具有潜在的动态图结构,其中实体及其交互随时间演变。机器学习模型应考虑这些动态特性,以便在下游任务中充分发挥其潜力。以往图表示学习方法要么聚焦于采样k跳邻域(类似于广度优先搜索),要么聚焦于随机游走(类似于深度优先搜索)。然而,这些方法计算成本高昂,不适用于动态图上的实时低延迟推理。为克服这些限制,我们提出图快照——一种针对连续时间动态图(CTDG)的通用特征提取框架,具有低延迟特性,且与最先进的高延迟模型性能相当。为此,我们提出了一种基于随机游走特征的流式低延迟近似方法。在该框架中,通过仅对输入边进行单跳操作,即可计算总结多跳信息的时间感知节点嵌入。我们在三个开源数据集和两个内部数据集上评估所提方法,并与三种最先进算法(TGN-attn、TGN-ID、Jodie)进行对比。实验表明,我们的图快照特征结合机器学习分类器,在五个数据集的节点分类任务中取得了具有竞争力的性能(优于所有基线方法)。同时,图快照显著降低了推理延迟,在实验设置中实现了近一个数量级的加速。