We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.
翻译:我们通过实验评估了四种声称具备函数调用能力的开源大语言模型对三种不同攻击的鲁棒性,并测量了八种不同防御策略的有效性。研究结果表明,这些模型在默认状态下并不安全,且现有防御方案尚未达到可在实际场景中部署的水平。