Prior-fitted networks (PFNs) are a promising class of tabular foundation models that perform in-context learning, whereby the entire labelled training set is supplied as context, and predictions for test queries are produced in a single forward pass. However, the quadratically scaling self-attention mechanism in many PFN architectures makes inference prohibitive for very large training datasets. We propose CRUMB (Clustered Retrieval Using Minimised-MMD Batching), a three-stage inference wrapper that (i) clusters the test queries, (ii) selects a small, distributionally matched training subset for each cluster by greedily minimising the maximum mean discrepancy (MMD), and (iii) runs exact PFN inference on each reduced-context batch. CRUMB is architecture-agnostic and requires no retraining. On the 51-dataset TabArena benchmark, evaluated across three PFN architectures (TabPFNv2, TabICLv1, TabICLv2), we show that CRUMB outperforms similar state-of-the-art context selection strategies. We also show that CRUMB is resilient to covariate drift, as the MMD-minimisation step naturally helps align the training context distribution to match the current test batch distributions.
翻译:先验拟合网络(PFN)是一类有前景的表格基础模型,其通过上下文学习实现推理——整个有标注的训练集被作为上下文输入,测试查询的预测结果可在单次前向传播中生成。然而,许多PFN架构中采用的自注意力机制存在二次方扩展问题,导致其对超大规模训练数据集的推理变得不可行。本文提出CRUMB(基于最小化最大均值差异的聚类检索批处理方法),这是一个三阶段推理封装框架:(i) 对测试查询进行聚类;(ii) 通过贪婪最小化最大均值差异(MMD),为每个聚类选取规模较小且分布匹配的训练子集;(iii) 在每个缩减上下文批处理上执行精确PFN推理。CRUMB具有架构无关性且无需重新训练。在包含51个数据集的TabArena基准测试中,基于三种PFN架构(TabPFNv2、TabICLv1、TabICLv2)的评估表明,CRUMB的性能优于同类最先进的上下文选择策略。此外,由于MMD最小化步骤能自然地帮助调整训练上下文分布以匹配当前测试批处理分布,CRUMB对协变量漂移具有鲁棒性。