Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant challenges and vulnerabilities compared to on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be (statistically) significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study by inferring whether SLMs would provide responses to potentially unethical vanilla prompts, collated from prior jailbreaking and prompt engineering studies and other sources. Strikingly, the on-device SLMs did answer valid responses to these prompts, which ideally should be rejected. Even more seriously, the on-device SLMs responded with valid answers without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios including: societal harm, illegal activities, hate, self-harm, exploitable phishing content and exploitable code, all of which indicates the high vulnerability and exploitability of these on-device SLMs. Overall, our findings highlight gaping vulnerabilities in state-of-the-art on-device AI which seem to stem from resource constraints faced by these models and which may make typical defenses fundamentally challenging to be deployed in these environments.

翻译：本文首次针对设备端人工智能（AI）的信任与伦理影响展开研究，重点关注适用于智能手机等个人设备的"小型"语言模型。尽管相较于云端服务，设备端SLM在隐私保护、延迟降低和用户体验方面具有优势，但我们认为其与服务器端模型相比可能引入显著挑战与脆弱性。作为信任评估研究的一部分，我们基于成熟的信任度量框架，对最先进的设备端SLM及其服务器端对应模型进行了系统性评估。结果显示设备端SLM在统计意义上显著缺乏可信度，具体表现为更多刻板印象、不公平及侵犯隐私的行为。基于这些发现，我们进一步开展伦理评估研究，通过推断SLM是否会回应潜在不道德的原始提示（这些提示来自先前越狱攻击、提示工程研究及其他来源）进行分析。令人震惊的是，设备端SLM确实对这些本应拒绝的提示给出了有效回应。更严重的是，设备端SLM在没有任何过滤机制、无需越狱或提示工程的情况下即生成有效答案。这些回应可能被滥用于各类有害及不道德场景，包括：社会危害、非法活动、仇恨言论、自残行为、可利用的网络钓鱼内容及可执行恶意代码，充分表明设备端SLM具有高度脆弱性与可利用性。总体而言，我们的研究揭示了当前设备端AI存在的巨大安全漏洞，这些漏洞似乎源于模型面临的资源限制，并可能使得常规防御机制难以在此类环境中有效部署。