A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
翻译:一种与良好人类驾驶习惯保持一致、或至少能与人类驾驶员有效协作的驾驶算法,对于开发安全高效的自动驾驶车辆至关重要。实践中通常采用两种主要方法:(i) 监督学习或模仿学习,需要获取全面的自然驾驶数据,涵盖影响车辆决策的所有状态及相应动作;(ii) 强化学习(RL),其模拟驾驶环境需与现实条件匹配或刻意设置得更具挑战性。这两种方法都依赖于对真实驾驶行为的高质量观测,而此类数据往往难以获取且成本高昂。单个车辆搭载的先进传感器可收集微观数据,但缺乏对周边环境背景的感知;反之,路侧传感器能捕捉交通流等宏观特征,却无法在微观层面与具体车辆关联。基于这种互补性,我们提出一个框架:利用宏观观测重构未观测的微观状态,以微观数据锚定观测到的车辆行为,并学习一个共享策略——该策略在微观层面与部分观测到的轨迹和动作保持一致,在宏观层面当群体部署时能与目标交通统计特征对齐。这种经过约束和正则化的策略有助于实现大规模的真实交通流模式及与人类驾驶员的安全协同。