BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Despite large language models (LLMs) being known to exhibit bias against non-mainstream varieties, there are no known labeled datasets for sentiment analysis of English. To address this gap, we introduce BESSTIE, a benchmark for sentiment and sarcasm classification for three varieties of English: Australian (en-AU), Indian (en-IN), and British (en-UK). Using web-based content from two domains, namely, Google Place reviews and Reddit comments, we collect datasets for these language varieties using two methods: location-based and topic-based filtering. Native speakers of the language varieties manually annotate the datasets with sentiment and sarcasm labels. Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks. Our results reveal that the models consistently perform better on inner-circle varieties (i.e., en-AU and en-UK), with significant performance drops for en-IN, particularly in sarcasm detection. We also report challenges in cross-variety generalisation, highlighting the need for language variety-specific datasets such as ours. BESSTIE promises to be a useful evaluative benchmark for future research in equitable LLMs, specifically in terms of language varieties. The BESSTIE datasets, code, and models are currently available on request, while the paper is under review. Please email [email protected].

翻译：尽管已知大型语言模型（LLMs）对非主流语言变体存在偏见，但目前尚无针对英语变体情感分析的标注数据集。为填补这一空白，我们提出了BESSTIE——一个面向三种英语变体（澳大利亚英语en-AU、印度英语en-IN和英国英语en-UK）的情感与讽刺分类基准。通过采集谷歌地点评论和Reddit评论两个领域的网络内容，我们采用基于地理位置和基于主题的两种过滤方法构建了这些语言变体的数据集。各语言变体的母语者对数据集进行了人工情感与讽刺标注。随后，我们在这些数据集上对九种大型语言模型（涵盖编码器/解码器架构及单语/多语模型）进行微调，并评估其在两项任务上的性能。实验结果表明，模型在内圈英语变体（即en-AU和en-UK）上表现持续更优，而在en-IN上性能显著下降，尤其在讽刺检测任务中。我们还报告了跨变体泛化面临的挑战，凸显了构建如本数据集这类针对特定语言变体的数据集的必要性。BESSTIE有望成为未来推动公平性大型语言模型研究（特别是在语言变体层面）的重要评估基准。BESSTIE数据集、代码与模型目前可通过申请获取，相关论文正在评审中。联系邮箱：[email protected]。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日