Data-adaptive two-sample testing assesses if two samples come from the same distribution, using a discrepancy learned from the data (e.g., via kernel-based feature representations). Such methods typically rely on data splitting to decouple learning from testing and control type I error. However, this paradigm is ill-suited to few-shot settings with severe sample-size imbalance: abundant reference samples are available, while only a handful of query samples arrive. In this paper, we show how this imbalance can be leveraged constructively. Using abundant reference data, we learn reference-dependent representations that summarize salient structure of the reference distribution and provide informative signals for detecting departures. We incorporate a collection of representation families that capture both global and local structure, and adaptively weight them using only reference samples via an uncertainty-guided principle. Theoretically, we establish permutation-based type I error control and show consistency of the aggregated test: as the sample sizes grow, the test power converges to one whenever the representation set contains at least one consistent representation. Empirically, our aggregation achieves strong performance across a range of benchmarks while retaining type I error control.
翻译:数据自适应双样本检验通过从数据中学习到的差异度量(例如基于核的特征表示)来评估两个样本是否来自同一分布。此类方法通常依赖于数据拆分,以将学习过程与检验解耦并控制第一类错误。然而,这种范式难以适用于样本量严重不平衡的小样本场景:参考样本数量充足,而查询样本仅有个位数。本文证明了如何建设性地利用这种不平衡性。利用丰富的参考数据,我们学习了依赖参考样本的表示,这些表示总结了参考分布的主要结构,并为检测偏差提供了信息性信号。我们整合了一系列表示族,以捕捉全局和局部结构,并通过基于不确定性引导的原则仅使用参考样本对其进行自适应加权。理论上,我们建立了基于置换的第一类错误控制,并证明了聚合检验的一致性:随着样本量增大,只要表示集中至少包含一个一致表示,检验功效将收敛到1。实验上,我们的聚合方法在多个基准测试中均取得了强性能,同时保持了第一类错误控制。