Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning, the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the IID assumption does not hold. We here propose a method, tailored to transductive classification contexts, for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the labelled training data. We provide experimental results that show the benefits brought about by our method.
翻译:直推式学习是一种监督机器学习任务,与传统归纳式学习不同,其待标注的未标注数据是一个有限集且在训练时即可获得。与归纳式学习场景类似,直推式学习场景可能受到数据集偏移的影响,即独立同分布假设可能不成立。本文针对直推式分类场景,提出一种在数据呈现先验概率偏移(一种常见于反因果学习问题的重要数据集偏移类型)时执行模型选择(即超参数优化)的方法。在我们提出的方法中,超参数可以直接在待应用训练分类器的未标注数据上进行优化;这与传统模型选择方法(基于对标注训练数据进行交叉验证)形成对比。我们提供的实验结果展示了该方法带来的优势。