Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.
翻译:大语言模型(LLMs)正在变革企业工作流程,但当员工无意中分享机密数据或生成违反政策的内容时,也带来了安全与伦理挑战。本文提出SafeGPT,一种双向防护系统,用于防范敏感数据泄露和不道德输出。SafeGPT集成了输入侧检测/脱敏、输出侧审核/重构以及人机协同反馈机制。实验表明,SafeGPT在保持用户满意度的同时,能有效降低数据泄露风险和偏见性输出。