Foundation models in language and vision benefit from a unified discrete token interface that converts raw inputs into sequences for scalable pre-training and inference. For graphs, an effective tokenizer should yield reusable discrete codes that capture both node semantics and relational structure across scales, yet prior quantization-based graph tokenizers typically combine residual vector quantization (RVQ) levels with fixed rules and often focus on a single structural view, limiting cross-task transfer. We present a hierarchical quantized tokenization framework with task-conditioned routing and dual-view token streams. It produces multi-scale codes and two synchronized sequences: a local stream that preserves node-level information and a diffusion-style multi-hop stream that summarizes connectivity. A lightweight router learns task-dependent mixtures over RVQ depths to select an appropriate granularity, while a gated cross-attention module aligns and fuses the two streams into a single token sequence without altering the downstream backbone encoder. Experiments on node classification and link prediction show consistent gains over strong quantized baselines at matched compute, with ablations verifying contributions from hierarchical quantization, adaptive routing, and fusion.
翻译:语言与视觉领域的基础模型受益于统一的离散分词接口,该接口将原始输入转换为序列,从而实现可扩展的预训练与推理。对于图数据而言,有效的分词器应能生成可复用的离散编码,同时捕获多尺度的节点语义与关系结构;然而,现有的基于量化的图分词器通常将残差向量量化(RVQ)层级与固定规则相结合,且往往仅关注单一结构视角,限制了跨任务迁移能力。本文提出一种具备任务条件路由与双视图分词流的层次化量化分词框架。该框架生成多尺度编码及两条同步序列:一条保留节点级信息的局部流,以及一条通过扩散式多跳聚合总结连通性的扩散流。轻量级路由器学习基于任务的RVQ深度混合策略以选择合适的粒度,而门控交叉注意力模块则在不改变下游骨干编码器的前提下,将两条流对齐并融合为单一分词序列。在节点分类与链接预测任务上的实验表明,在相同计算成本下,本方法相较于强量化基线模型取得了一致的性能提升,消融实验验证了层次化量化、自适应路由与融合机制的有效贡献。