Two-sample tests have been extensively employed in various scientific fields and machine learning such as evaluation on the effectiveness of drugs and A/B testing on different marketing strategies to discriminate whether two sets of samples come from the same distribution or not. Kernel-based procedures for hypothetical testing have been proposed to efficiently disentangle high-dimensional complex structures in data to obtain accurate results in a model-free way by embedding the data into the reproducing kernel Hilbert space (RKHS). While the choice of kernels plays a crucial role for their performance, little is understood about how to choose kernel especially for small datasets. Here we aim to construct a hypothetical test which is effective even for small datasets, based on the theoretical foundation of kernel-based tests using maximum mean discrepancy, which is called MMD-FUSE. To address this, we enhance the MMD-FUSE framework by incorporating quantum kernels and propose a novel hybrid testing strategy that fuses classical and quantum kernels. This approach creates a powerful and adaptive test by combining the domain-specific inductive biases of classical kernels with the unique expressive power of quantum kernels. We evaluate our method on various synthetic and real-world clinical datasets, and our experiments reveal two key findings: 1) With appropriate hyperparameter tuning, MMD-FUSE with quantum kernels consistently improves test power over classical counterparts, especially for small and high-dimensional data. 2) The proposed hybrid framework demonstrates remarkable robustness, adapting to different data characteristics and achieving high test power across diverse scenarios. These results highlight the potential of quantum-inspired and hybrid kernel strategies to build more effective statistical tests, offering a versatile tool for data analysis where sample sizes are limited.
翻译:双样本检验已广泛应用于各科学领域及机器学习中,例如评估药物有效性、对不同营销策略进行A/B测试,以判别两组样本是否来自同一分布。基于核的假设检验方法通过将数据嵌入再生核希尔伯特空间(RKHS),能够以模型无关的方式有效解析高维复杂数据结构,从而获得准确结果。尽管核函数的选择对其性能至关重要,但如何针对小数据集选择核函数仍缺乏深入理解。本文旨在基于最大均值差异(MMD)的核检验理论基础,构建一种即使对小数据集也有效的假设检验方法,称为MMD-FUSE。为此,我们通过引入量子核增强MMD-FUSE框架,并提出一种融合经典核与量子核的新型混合检验策略。该方法通过结合经典核的领域特定归纳偏置与量子核独特的表达能力,形成强大且自适应的检验机制。我们在多种合成及真实临床数据集上评估了该方法,实验揭示了两项关键发现:1)经过适当的超参数调优,采用量子核的MMD-FUSE在检验效能上持续优于经典核方法,尤其对于小规模高维数据;2)所提出的混合框架展现出显著的鲁棒性,能适应不同数据特征,并在多样场景中实现高检验效能。这些结果凸显了量子启发及混合核策略在构建更有效统计检验方面的潜力,为样本量有限的数据分析提供了通用工具。