Empowering language is important in many real-world contexts, from education to workplace dynamics to healthcare. Though language technologies are growing more prevalent in these contexts, empowerment has not been studied in NLP, and moreover, it is inherently challenging to operationalize because of its subtle, implicit nature. This work presents the first computational exploration of empowering language. We first define empowerment detection as a new task, grounding it in linguistic and social psychology literature. We then crowdsource a novel dataset of Reddit posts labeled for empowerment, reasons why these posts are empowering to readers, and the social relationships between posters and readers. Our preliminary analyses show that this dataset, which we call TalkUp, can be used to train language models that capture empowering and disempowering language. More broadly, as it is rich with the ambiguities and diverse interpretations of real-world language, TalkUp provides an avenue to explore implication, presuppositions, and how social context influences the meaning of language.
翻译:赋能语言在从教育、职场动态到医疗保健等众多实际场景中具有重要意义。尽管语言技术在这些领域日益普及,但自然语言处理中尚未对赋能现象展开系统研究,且因其隐晦含蓄的特性,对其进行操作性定义本身颇具挑战。本文首次对赋能语言进行计算性探索。我们首先将赋能检测定义为一项新任务,并基于语言学与社会心理学文献为其奠定理论基础。随后通过众包构建了一个包含Reddit帖子标注的新型数据集,标注内容包括帖子的赋能属性、这些帖子对读者产生赋能效应的原因,以及发帖者与读者之间的社会关系。初步分析表明,该数据集(命名为TalkUp)可用于训练捕捉赋能与非赋能语言的语言模型。更广泛而言,由于这个数据集饱含真实语言的歧义性与多元解读可能,TalkUp为探究隐含意义、预设信息以及社会情境如何影响语言释义开辟了新的研究路径。