鲁棒主动学习的唯一拉什蒙集合 (Unique Rashomon Sets for Robust Active Learning)

Collecting labeled data for machine learning models is often expensive and time-consuming. Active learning addresses this challenge by selectively labeling the most informative observations, but when initial labeled data is limited, it becomes difficult to distinguish genuinely informative points from those appearing uncertain primarily due to noise. Ensemble methods like random forests are a powerful approach to quantifying this uncertainty but do so by aggregating all models indiscriminately. This includes poor performing models and redundant models, a problem that worsens in the presence of noisy data. We introduce UNique Rashomon Ensembled Active Learning (UNREAL), which selectively ensembles only distinct models from the Rashomon set, which is the set of nearly optimal models. Restricting ensemble membership to high-performing models with different explanations helps distinguish genuine uncertainty from noise-induced variation. We show that UNREAL achieves faster theoretical convergence rates than traditional active learning approaches and demonstrates empirical improvements of up to 20% in predictive accuracy across five benchmark datasets, while simultaneously enhancing model interpretability.

翻译：为机器学习模型收集标注数据通常成本高昂且耗时。主动学习通过选择性标注最具信息量的观测值来应对这一挑战，但当初始标注数据有限时，很难区分真正具有信息量的数据点与那些主要因噪声而显得不确定的数据点。随机森林等集成方法是通过量化这种不确定性的有效途径，但其不加区分地聚合所有模型，这包括了性能较差的模型和冗余模型，该问题在存在噪声数据时会进一步恶化。本文提出唯一拉什蒙集成主动学习（UNREAL），该方法仅选择性地集成来自拉什蒙集合（即近似最优模型集合）中的独特模型。将集成成员限制为具有不同解释的高性能模型，有助于区分真实的不确定性与噪声引起的变异。我们证明，UNREAL相比传统主动学习方法实现了更快的理论收敛速率，并在五个基准数据集上展现出高达20%的预测准确率提升，同时增强了模型的可解释性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日