Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., regularization) do not carry over, and some new opportunities arise (e.g., prompt-based remediation). We measure fairness of LLM-based classifiers on a toxicity classification task, and empirically show that prompt-based classifiers may lead to unfair decisions. We introduce several remediation techniques and benchmark their fairness and performance trade-offs. We hope our work encourages more research on group fairness in LLM-based classifiers.
翻译:提示大型语言模型(LLMs)为文本数据分类创造了新颖而有趣的方法。虽然在分类器公平性研究中,评估和修正群体公平性是一个被深入探讨的问题,但某些经典方法(如正则化)并不适用,同时出现了一些新的修正途径(如基于提示的修正)。我们在毒性分类任务中测量基于LLM的分类器的公平性,并通过实证研究表明基于提示的分类器可能导致不公平的决策。我们提出了多种修正技术,并对其公平性与性能之间的权衡进行了基准测试。我们希望这项工作能促进更多关于基于LLM的分类器中群体公平性的研究。