More than half the global population now carries devices that can run ChatGPT-like language models with no Internet connection and minimal safety oversight -- and hence the potential to promote self-harm, financial losses and extremism among other dangers. Existing safety tools either require cloud connectivity or discover failures only after harm has occurred. Here we show that a large class of potentially dangerous tipping originates at the atomistic scale in such edge AI due to competition for the machinery's attention. This yields a mathematical formula for the dynamical tipping point n*, governed by dot-product competition for attention between the conversation's context and competing output basins, that reveals new control levers. Validated against multiple AI models, the mechanism can be instantiated for different definitions of 'good' and 'bad' and hence in principle applies across domains (e.g. health, law, finance, defense), changing legal landscapes (e.g. EU, UK, US and state level), languages, and cultural settings.
翻译:目前全球超过一半人口携带的设备能够离线运行类似ChatGPT的语言模型,且安全监管极为有限——这可能导致自我伤害、经济损失和极端主义等潜在危险。现有安全工具要么依赖云端连接,要么只能在损害发生后发现故障。本文证明,在此类边缘人工智能中,大量潜在危险临界转变源于机器注意力竞争的微观机制。我们推导出由对话上下文与竞争输出域之间的点积注意力竞争所主导的动态临界点n*的数学公式,揭示了新的控制机制。该机制在多个AI模型中得到验证,可根据"良"与"劣"的不同定义进行实例化,因此原则上适用于跨领域应用(如医疗、法律、金融、国防)、不断变化的法律环境(如欧盟、英国、美国及各州层面)、不同语言和文化场景。