LLMs are increasingly being considered for prediction tasks in high-stakes social service settings, but their algorithmic fairness properties in this context are poorly understood. In this short technical report, we audit the algorithmic fairness of LLM-based tabular classification on a real housing placement prediction task, augmented with street outreach casenotes from a nonprofit partner. We audit multi-class classification error disparities. We find that a fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities. We experiment with variable importance improvements to zero-shot tabular classification and find mixed results on resulting algorithmic fairness. Overall, given historical inequities in housing placement, it is crucial to audit LLM use. We find that leveraging LLMs to augment tabular classification with casenote summaries can safely leverage additional text information at low implementation burden. The outreach casenotes are fairly short and heavily redacted. Our assessment is that LLM zero-shot classification does not introduce additional textual biases beyond algorithmic biases in tabular classification. Combining fine-tuning and leveraging casenote summaries can improve accuracy and algorithmic fairness.
翻译:大语言模型正被越来越多地考虑用于高风险社会服务领域的预测任务,但在此背景下其算法公平性特性尚不明确。本简短技术报告以某非营利合作伙伴提供的街道外展病历注释为补充信息,在实际住房安置预测任务中审计了基于大语言模型的表格分类算法公平性。我们审计了多类分类误差差异,发现经病历注释摘要增强的微调模型在提升准确率的同时可降低算法公平性差异。我们尝试通过变量重要性改进零样本表格分类,发现对算法公平性的影响结果不一。总体而言,鉴于住房安置领域的历史不平等性,审计大语言模型的应用至关重要。我们的研究表明,利用大语言模型通过病历注释摘要增强表格分类,可在低实施负担下安全利用额外文本信息。外展病历注释篇幅较短且经过大量脱敏处理。评估认为,大语言模型零样本分类并未引入超出表格分类算法偏差之外的额外文本偏见。结合微调与病历注释摘要可同时提升准确率与算法公平性。