Developing and deploying machine learning models safely depends on the ability to characterize and compare their abilities to generalize to new environments. Although recent work has proposed a variety of methods that can directly predict or theoretically bound the generalization capacity of a model, they rely on strong assumptions such as matching train/test distributions and access to model gradients. In order to characterize generalization when these assumptions are not satisfied, we propose neighborhood invariance, a measure of a classifier's output invariance in a local transformation neighborhood. Specifically, we sample a set of transformations and given an input test point, calculate the invariance as the largest fraction of transformed points classified into the same class. Crucially, our measure is simple to calculate, does not depend on the test point's true label, makes no assumptions about the data distribution or model, and can be applied even in out-of-domain (OOD) settings where existing methods cannot, requiring only selecting a set of appropriate data transformations. In experiments on robustness benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our neighborhood invariance measure and actual OOD generalization on over 4,600 models evaluated on over 100 unique train/test domain pairs.
翻译:安全开发和部署机器学习模型取决于描述和比较模型泛化至新环境的能力。尽管近期研究提出了多种可直接预测或理论上界定模型泛化能力的方法,但这些方法依赖于较强假设,例如训练/测试分布匹配和模型梯度可获取。为在不满足这些假设时描述泛化能力,我们提出邻域不变性——一种衡量分类器在局部变换邻域中输出不变性的指标。具体而言,我们采样一组变换,并针对给定测试点,计算其不变性为被分类至同一类别的变换点最大比例。关键的是,该指标计算简便、不依赖测试点真实标签、不对数据分布或模型作任何假设,且可在现有方法无法应用的域外场景中适用,仅需选择一组合适的数据变换。在图像分类、情感分析和自然语言推理的鲁棒性基准实验中,我们基于超过4600个模型在100余组独特训练/测试域对上的评估,证实了邻域不变性指标与实际域外泛化性能间存在强且稳健的相关性。