Data-free knowledge distillation enables model compression without original training data, critical for privacy-sensitive tabular domains. However, existing methods does not perform well on tabular data because they do not explicitly address feature interactions, the fundamental way tabular models encode predictive knowledge. We identify interaction diversity, systematic coverage of feature combinations, as an essential requirement for effective tabular distillation. To operationalize this insight, we propose TabKD, which learns adaptive feature bins aligned with teacher decision boundaries, then generates synthetic queries that maximize pairwise interaction coverage. Across 4 benchmark datasets and 4 teacher architectures, TabKD achieves highest student-teacher agreement in 14 out of 16 configurations, outperforming 5 state-of-the-art baselines. We further show that interaction coverage strongly correlates with distillation quality, validating our core hypothesis. Our work establishes interaction-focused exploration as a principled framework for tabular model extraction.
翻译:无数据知识蒸馏能够在无需原始训练数据的情况下实现模型压缩,这对隐私敏感的表格领域至关重要。然而,现有方法在表格数据上表现不佳,因为它们未明确处理特征交互——表格模型编码预测知识的基本方式。我们识别出交互多样性(即特征组合的系统性覆盖)是有效表格蒸馏的关键要求。为将这一见解付诸实践,我们提出TabKD,该方法学习与教师模型决策边界对齐的自适应特征箱,然后生成最大化成对交互覆盖的合成查询。在4个基准数据集和4种教师架构上,TabKD在16种配置中的14种取得了最高的学生-教师一致性,优于5个最先进的基线方法。我们进一步证明,交互覆盖度与蒸馏质量高度相关,验证了我们的核心假设。本工作确立了以交互为中心探索作为表格模型提取的系统化框架。