As large language models (LLMs) are increasingly deployed in enterprise settings, controlling model behavior based on user roles becomes an essential requirement. Existing safety methods typically assume uniform access and focus on preventing harmful or toxic outputs, without addressing role-specific access constraints. In this work, we investigate whether LLMs can be fine-tuned to generate responses that reflect the access privileges associated with different organizational roles. We explore three modeling strategies: a BERT-based classifier, an LLM-based classifier, and role-conditioned generation. To evaluate these approaches, we construct two complementary datasets. The first is adapted from existing instruction-tuning corpora through clustering and role labeling, while the second is synthetically generated to reflect realistic, role-sensitive enterprise scenarios. We assess model performance across varying organizational structures and analyze robustness to prompt injection, role mismatch, and jailbreak attempts.
翻译:随着大型语言模型(LLM)在企业环境中的部署日益增多,基于用户角色控制模型行为已成为一项关键需求。现有安全方法通常假设统一的访问权限,并侧重于防止有害或有毒输出,而未解决基于角色的访问约束问题。本研究探讨了是否可以通过微调LLM,使其生成反映不同组织角色对应访问权限的响应。我们探索了三种建模策略:基于BERT的分类器、基于LLM的分类器以及角色条件生成。为评估这些方法,我们构建了两个互补的数据集。第一个数据集通过对现有指令微调语料库进行聚类和角色标注而得到,第二个数据集则通过合成生成,以反映真实且对角色敏感的企业场景。我们在不同组织结构下评估模型性能,并分析了模型对提示注入、角色不匹配及越狱尝试的鲁棒性。