Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge

While transformer-based systems have enabled greater accuracies with fewer training examples, data acquisition obstacles still persist for rare-class tasks -- when the class label is very infrequent (e.g. < 5% of samples). Active learning has in general been proposed to alleviate such challenges, but choice of selection strategy, the criteria by which rare-class examples are chosen, has not been systematically evaluated. Further, transformers enable iterative transfer-learning approaches. We propose and investigate transfer- and active learning solutions to the rare class problem of dissonance detection through utilizing models trained on closely related tasks and the evaluation of acquisition strategies, including a proposed probability-of-rare-class (PRC) approach. We perform these experiments for a specific rare class problem: collecting language samples of cognitive dissonance from social media. We find that PRC is a simple and effective strategy to guide annotations and ultimately improve model accuracy while transfer-learning in a specific order can improve the cold-start performance of the learner but does not benefit iterations of active learning.

翻译：基于Transformer的系统虽能以更少训练样本实现更高准确率，但在类别标签极为罕见（例如样本占比低于5%）的稀有类别任务中，数据获取障碍依然存在。主动学习通常被提出用于缓解此类难题，但选择策略（即选取稀有类别样本的准则）尚未得到系统评估。此外，Transformer支持迭代式迁移学习方法。我们提出并探究了针对失调检测中稀有类别问题的迁移学习与主动学习解决方案，具体包括：利用在密切关联任务上训练的模型，以及评估多种采集策略（含本文提出的稀有类别概率法）。我们针对特定稀有类别问题开展实验：从社交媒体收集认知失调的语言样本。实验发现，稀有类别概率法是一种简单有效的注释引导策略，能最终提升模型准确率；而按特定顺序进行迁移学习可改善学习器的冷启动性能，但对主动学习的迭代过程无显著增益。

相关内容

主动学习

关注 243

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。