As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.
翻译:随着大型语言模型(LLMs)日益融入我们的日常生活,识别并减轻其风险变得至关重要,尤其是当这些风险可能对人类用户和社会产生深远影响时。安全护栏——即对LLMs的输入或输出进行过滤的技术——已成为一项核心的安全保障技术。本立场文件深入审视了当前的开源解决方案(Llama Guard、Nvidia NeMo、Guardrails AI),并讨论了构建更完善解决方案所面临的挑战与未来路径。基于先前研究的坚实证据,我们主张采用一种系统化方法来为LLMs构建安全护栏,该方法需全面考量不同LLMs应用场景下的多样化情境。我们建议通过与多学科团队协作,运用社会技术学方法以明确精准的技术需求;探索先进的神经符号实现方案以应对需求的复杂性;并发展验证与测试流程,以确保最终产品的最高质量。