This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT, on supervised learning of binary protest news classification and sentiment analysis of product reviews. A "cross-context" setting is enabled using test sets that are distinct from the training data. Specifically, in the news classification task, the models are developed on local news from India and tested on the local news from China. In the sentiment analysis task, the models are trained on movie reviews and tested on customer reviews. This comparison is aimed at exploring the limits of the representative power of today's Natural Language Processing systems on the path to the systems that are generalizable to real-life scenarios. The models are fine-tuned and fed into a Feed-Forward Neural Network and a Bidirectional Long Short Term Memory network. Multinomial Naive Bayes and Linear Support Vector Machine are used as traditional baselines. The results show that, in binary text classification, DistilBERT is significantly better than ELMo on generalizing to the cross-context setting. ELMo is observed to be significantly more robust to the cross-context test data than both baselines. On the other hand, the baselines performed comparably well to ELMo when the training and test data are subsets of the same corpus (no cross-context). DistilBERT is also found to be 30% smaller and 83% faster than ELMo. The results suggest that DistilBERT can transfer generic semantic knowledge to other domains better than ELMo. DistilBERT is also favorable in incorporating into real-life systems for it requires a smaller computational training budget. When generalization is not the utmost preference and test domain is similar to the training domain, the traditional ML algorithms can still be considered as more economic alternatives to deep language representations.
翻译:本研究评估了两种先进深度语境语言表示——ELMo和DistilBERT——在监督学习任务中的鲁棒性,具体包括二元抗议新闻分类和产品评论情感分析。通过使用与训练数据不同的测试集,启用了“跨语境”设置。具体而言,在新闻分类任务中,模型基于印度本地新闻进行训练,并在中国本地新闻上进行测试。在情感分析任务中,模型在电影评论上训练,并在客户评论上测试。这一比较旨在探索当前自然语言处理系统表征能力的极限,以推动系统向能够泛化到现实场景的方向发展。模型经过微调后,分别输入前馈神经网络和双向长短期记忆网络。多项朴素贝叶斯和线性支持向量机作为传统基线方法。结果表明,在二元文本分类中,DistilBERT在跨语境泛化方面显著优于ELMo。观察到ELMo对跨语境测试数据的鲁棒性明显高于两种基线方法。另一方面,当训练数据和测试数据来自同一语料库的子集时(无跨语境),基线方法的表现与ELMo相当。此外,DistilBERT比ELMo小30%,速度快83%。结果表明,DistilBERT能比ELMo更好地将通用语义知识迁移到其他领域。由于需要更小的计算训练预算,DistilBERT也更适合集成到实际系统中。当泛化不是首要目标且测试领域与训练领域相似时,传统机器学习算法仍可视为深度语言表示的更经济替代方案。