The increasing ubiquity of language technology necessitates a shift towards considering cultural diversity in the machine learning realm, particularly for subjective tasks that rely heavily on cultural nuances, such as Offensive Language Detection (OLD). Current understanding underscores that these tasks are substantially influenced by cultural values, however, a notable gap exists in determining if cultural features can accurately predict the success of cross-cultural transfer learning for such subjective tasks. Addressing this, our study delves into the intersection of cultural features and transfer learning effectiveness. The findings reveal that cultural value surveys indeed possess a predictive power for cross-cultural transfer learning success in OLD tasks and that it can be further improved using offensive word distance. Based on these results, we advocate for the integration of cultural information into datasets. Additionally, we recommend leveraging data sources rich in cultural information, such as surveys, to enhance cultural adaptability. Our research signifies a step forward in the quest for more inclusive, culturally sensitive language technologies.
翻译:语言技术的日益普及要求机器学习领域转向关注文化多样性,尤其是对于严重依赖文化细微差别的主观任务,如攻击性语言检测(OLD)。当前认知表明,这类任务受文化价值观显著影响,然而在确定文化特征能否准确预测此类主观任务的跨文化迁移学习成功与否方面仍存在明显空白。为此,本研究深入探讨文化特征与迁移学习有效性的交叉作用。研究结果表明,文化价值观调查确实对OLD任务中的跨文化迁移学习成功具有预测能力,且通过引入攻击性词语距离可进一步提升预测效果。基于这些发现,我们倡导将文化信息纳入数据集,同时建议利用富含文化信息的数据源(如调查数据)来增强文化适应性。本研究为追求更具包容性和文化敏感性的语言技术迈出了关键一步。