The Internet of Vehicles (IoV) faces a dynamic, adversarial security environment where attackers adapt to defenses. Existing intrusion detection systems rely on static classifiers that fail to capture sequential decision-making, attacker adaptation, and uncertainty. We formulate IoV security as a sequential attacker-defender interaction and model defense as a reinforcement learning problem under partial observability. We propose Quantum Belief-Integrated Reinforcement Defense (Q-BIRD), using quantum-inspired belief representation to encode defender uncertainty about hidden attacker intent via amplitude-based states, enabling non-Bayesian belief evolution. Integrated into a Proximal Policy Optimization (PPO) defender, Q-BIRD selects cost-aware mitigation actions. In simulated environments with adaptive, probing attackers, Q-BIRD reduced cumulative mean damage, damage variance, and attack success rate (ASR) by 60.4%, 90.2%, and 50.0%, respectively, while increasing survival probability by 46.4%. Compared to classical Bayesian PPO, damage variance reduction and ASR improved by 10.2 times and 50%. Ablation and explainability analyses confirm that amplitude-based belief is the primary decision signal during strategy transitions when classical belief collapses, providing superior IoV security without additional hardware.
翻译:车联网(IoV)面临动态对抗性安全环境,攻击者会自适应调整攻击策略。现有入侵检测系统依赖静态分类器,无法捕捉序列决策、攻击者自适应及不确定性。本文将IoV安全建模为序列化的攻击者-防御者博弈,并将防御问题转化为部分可观测条件下的强化学习问题。我们提出量子信念集成强化防御(Q-BIRD),通过振幅态编码防御者对隐藏攻击意图的不确定性,实现非贝叶斯信念演化。该量子启发式信念表示被集成到近端策略优化(PPO)防御器中,用于选择成本感知的缓解动作。在具有自适应探测攻击者的仿真环境中,Q-BIRD将累计平均损伤、损伤方差和攻击成功率(ASR)分别降低60.4%、90.2%和50.0%,同时将生存概率提升46.4%。与经典贝叶斯PPO相比,其损伤方差降幅和ASR提升分别达到10.2倍和50%。消融与可解释性分析证实:当经典信念失效时,振幅态信念是策略转换阶段的主要决策信号,可在无需额外硬件条件下提供更优的IoV安全性。