Researchers and developers increasingly rely on toxicity scoring to moderate generative language model outputs, in settings such as customer service, information retrieval, and content generation. However, toxicity scoring may render pertinent information inaccessible, rigidify or "value-lock" cultural norms, and prevent language reclamation processes, particularly for marginalized people. In this work, we extend the concept of algorithmic recourse to generative language models: we provide users a novel mechanism to achieve their desired prediction by dynamically setting thresholds for toxicity filtering. Users thereby exercise increased agency relative to interactions with the baseline system. A pilot study ($n = 30$) supports the potential of our proposed recourse mechanism, indicating improvements in usability compared to fixed-threshold toxicity-filtering of model outputs. Future work should explore the intersection of toxicity scoring, model controllability, user agency, and language reclamation processes -- particularly with regard to the bias that many communities encounter when interacting with generative language models.
翻译:研究人员与开发者日益依赖毒性评分来管控生成式语言模型的输出,应用场景涵盖客户服务、信息检索与内容生成等领域。然而,毒性评分可能导致相关信息无法获取、固化或"价值锁定"文化规范,并阻碍语言复权进程,尤其对边缘化群体影响显著。本研究将算法申诉机制概念拓展至生成式语言模型:通过允许用户动态设置毒性过滤阈值,为用户提供实现预期输出的新型机制。相较于基线系统,用户由此获得更强的交互自主权。一项先导实验(样本量n=30)验证了该申诉机制的应用潜力,证明相较固定阈值的毒性过滤输出方案,该方法能显著提升系统可用性。未来研究应深入探讨毒性评分、模型可控性、用户自主权与语言复权进程之间的交叉关系——尤其需要关注众多社群在与生成式语言模型交互时普遍遭遇的偏见问题。