Analyzing the Generalizability of Deep Contextualized Language Representations For Text Classification

This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT, on supervised learning of binary protest news classification and sentiment analysis of product reviews. A "cross-context" setting is enabled using test sets that are distinct from the training data. Specifically, in the news classification task, the models are developed on local news from India and tested on the local news from China. In the sentiment analysis task, the models are trained on movie reviews and tested on customer reviews. This comparison is aimed at exploring the limits of the representative power of today's Natural Language Processing systems on the path to the systems that are generalizable to real-life scenarios. The models are fine-tuned and fed into a Feed-Forward Neural Network and a Bidirectional Long Short Term Memory network. Multinomial Naive Bayes and Linear Support Vector Machine are used as traditional baselines. The results show that, in binary text classification, DistilBERT is significantly better than ELMo on generalizing to the cross-context setting. ELMo is observed to be significantly more robust to the cross-context test data than both baselines. On the other hand, the baselines performed comparably well to ELMo when the training and test data are subsets of the same corpus (no cross-context). DistilBERT is also found to be 30% smaller and 83% faster than ELMo. The results suggest that DistilBERT can transfer generic semantic knowledge to other domains better than ELMo. DistilBERT is also favorable in incorporating into real-life systems for it requires a smaller computational training budget. When generalization is not the utmost preference and test domain is similar to the training domain, the traditional ML algorithms can still be considered as more economic alternatives to deep language representations.

翻译：本研究评估了两种先进深度语境语言表示——ELMo和DistilBERT——在监督学习任务中的鲁棒性，具体包括二元抗议新闻分类和产品评论情感分析。通过使用与训练数据不同的测试集，启用了“跨语境”设置。具体而言，在新闻分类任务中，模型基于印度本地新闻进行训练，并在中国本地新闻上进行测试。在情感分析任务中，模型在电影评论上训练，并在客户评论上测试。这一比较旨在探索当前自然语言处理系统表征能力的极限，以推动系统向能够泛化到现实场景的方向发展。模型经过微调后，分别输入前馈神经网络和双向长短期记忆网络。多项朴素贝叶斯和线性支持向量机作为传统基线方法。结果表明，在二元文本分类中，DistilBERT在跨语境泛化方面显著优于ELMo。观察到ELMo对跨语境测试数据的鲁棒性明显高于两种基线方法。另一方面，当训练数据和测试数据来自同一语料库的子集时（无跨语境），基线方法的表现与ELMo相当。此外，DistilBERT比ELMo小30%，速度快83%。结果表明，DistilBERT能比ELMo更好地将通用语义知识迁移到其他领域。由于需要更小的计算训练预算，DistilBERT也更适合集成到实际系统中。当泛化不是首要目标且测试领域与训练领域相似时，传统机器学习算法仍可视为深度语言表示的更经济替代方案。

相关内容

ELMo

关注 19

近年来，研究人员通过文本上下文信息分析获得更好的词向量。ELMo是其中的翘楚，在多个任务、多个数据集上都有显著的提升。所以，它是目前最好用的词向量，the-state-of-the-art的方法。这篇文章发表在2018年的NAACL上，outstanding paper award。下面就简单介绍一下这个“神秘”的词向量模型。

【索邦大学博士论文】实体与关系抽取中的泛化与上下文化

专知会员服务

31+阅读 · 2022年6月20日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

【EMNLP2021教程】鲁棒自然语言处理，EMNLP 21 Tutorial on Robust NLP，176页pdf

专知会员服务

35+阅读 · 2021年11月12日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日