Domain generalization is a popular machine learning technique that enables models to perform well on the unseen target domain, by learning from multiple source domains. Domain generalization is useful in cases where data is limited, difficult, or expensive to collect, such as in object recognition and biomedicine. In this paper, we propose a novel domain generalization algorithm called "meta-forests", which builds upon the basic random forests model by incorporating the meta-learning strategy and maximum mean discrepancy measure. The aim of meta-forests is to enhance the generalization ability of classifiers by reducing the correlation among trees and increasing their strength. More specifically, meta-forests conducts meta-learning optimization during each meta-task, while also utilizing the maximum mean discrepancy as a regularization term to penalize poor generalization performance in the meta-test process. To evaluate the effectiveness of our algorithm, we test it on two publicly object recognition datasets and a glucose monitoring dataset that we have used in a previous study. Our results show that meta-forests outperforms state-of-the-art approaches in terms of generalization performance on both object recognition and glucose monitoring datasets.
翻译:域泛化是一种流行的机器学习技术,通过从多个源域中学习,使模型在未见过的目标域上表现良好。域泛化在数据有限、难以获取或获取成本高昂的场景中非常有用,例如目标识别和生物医学领域。本文提出了一种新颖的域泛化算法——"元森林",该算法基于基础随机森林模型,融合了元学习策略和最大均值差异度量。元森林的目标是通过降低树之间的相关性并增强其强度来提升分类器的泛化能力。具体而言,元森林在每个元任务中进行元学习优化,同时将最大均值差异作为正则化项,在元测试过程中惩罚较差的泛化性能。为评估算法有效性,我们在两个公开目标识别数据集以及前期研究中使用的血糖监测数据集上进行了测试。结果表明,在目标识别和血糖监测数据集上,元森林在泛化性能方面均优于现有最优方法。