When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model's generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model's performance.
翻译:在将机器学习系统部署到实际环境中时,系统能够有效利用先验知识适应陌生领域,同时对异常输入触发警报至关重要。为满足这些需求,通用域适应(UniDA)作为计算机视觉领域的一个新兴研究方向应运而生,其核心目标是同时实现适应能力与鲁棒性(即检测分布外样本的能力)。尽管UniDA已在计算机视觉领域取得显著进展,但其在语言输入上的应用尽管具有可行性,仍需进一步探索。本文针对自然语言任务提出了一个综合性基准,从泛化能力和鲁棒性两个维度对模型展开全面评估。该基准包含多个难度等级与特性各异的数据集,涉及时间偏移及多领域场景。基于该测试平台,我们验证了计算机视觉领域现有的UniDA方法以及自然语言处理文献中的前沿域适应技术,并得出重要发现:原始面向图像输入的UniDA方法可有效迁移至自然语言领域,同时适应难度对模型性能的决定性作用也得到凸显。