Conditional Distribution Compression via the Kernel Conditional Mean Embedding

Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of \textit{labelled} data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a metric for comparing conditional distributions, and derive a closed form estimator. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from cubic to linear. Leveraging this, we extend KH to propose Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm for constructing compressed sets that target the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), an adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we also propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.

翻译：现有的分布压缩方法，如核牧群算法（Kernel Herding, KH），最初是为无标签数据开发的。然而，目前尚无方法能够直接压缩带标签数据的条件分布。为填补这一空白，我们首先引入了平均最大条件均值差异（Average Maximum Conditional Mean Discrepancy, AMCMD），这是一种用于比较条件分布的度量，并推导出其闭式估计量。接着，我们提出一个关键观察：在分布压缩的背景下，针对AMCMD构建压缩集的成本可以从立方级降低到线性级。利用这一点，我们将KH扩展为平均条件核牧群算法（Average Conditional Kernel Herding, ACKH），这是一种线性时间的贪心算法，用于构建针对AMCMD的压缩集。为了更好地理解直接压缩条件分布相较于通过联合分布进行压缩的优势，我们引入了联合核牧群算法（Joint Kernel Herding, JKH），这是KH的一种改进，旨在压缩带标签数据的联合分布。尽管牧群方法提供了简单且可解释的选择过程，但它们依赖于贪心启发式策略。为了探索其他优化策略，我们还提出了联合核诱导点算法（Joint Kernel Inducing Points, JKIP）和平均条件核诱导点算法（Average Conditional Kernel Inducing Points, ACKIP），这些方法在保持线性复杂度的同时联合优化压缩集。实验表明，使用ACKIP直接保持条件分布的性能优于联合分布压缩以及ACKH中使用的贪心选择。此外，我们发现JKIP始终优于JKH。