In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to target data without accessing source data, crucial for privacy-sensitive tabular domains. However, existing TTA methods either 1) overlook the nature of tabular distribution shifts, often involving label distribution shifts, or 2) impose architectural constraints on the model, leading to a lack of applicability. To this end, we propose AdapTable, a novel TTA framework for tabular data. AdapTable operates in two stages: 1) calibrating model predictions using a shift-aware uncertainty calibrator, and 2) adjusting these predictions to match the target label distribution with a label distribution handler. We validate the effectiveness of AdapTable through theoretical analysis and extensive experiments on various distribution shift scenarios. Our results demonstrate AdapTable's ability to handle various real-world distribution shifts, achieving up to a 16% improvement on the HELOC dataset.
翻译:在现实场景中,表格数据常遭受分布偏移的困扰,这会威胁机器学习模型的性能。尽管分布偏移在表格领域普遍存在且至关重要,但由于表格数据本身固有的挑战,处理该领域的分布偏移仍未得到充分探索。在此背景下,测试时适应(TTA)提供了一种有前景的解决方案,它能在不访问源数据的情况下使模型适应目标数据,这对于隐私敏感的表格领域至关重要。然而,现有的TTA方法要么1)忽视了表格分布偏移的本质(通常涉及标签分布偏移),要么2)对模型施加了架构约束,导致适用性不足。为此,我们提出了AdapTable,一种新颖的表格数据TTA框架。AdapTable分两个阶段运行:1)使用一个偏移感知不确定性校准器来校准模型预测;2)通过一个标签分布处理器调整这些预测以匹配目标标签分布。我们通过理论分析和在各种分布偏移场景下的广泛实验验证了AdapTable的有效性。我们的结果表明,AdapTable能够处理各种现实世界的分布偏移,在HELOC数据集上实现了高达16%的性能提升。