A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
翻译:一个与良好人类驾驶实践对齐、或至少能与人类驾驶员有效协作的驾驶算法,对于开发安全高效的自动驾驶车辆至关重要。实践中通常采用两种主要方法:(i) 监督学习或模仿学习,需要涵盖影响车辆决策的所有状态及相应动作的全面自然驾驶数据;(ii) 强化学习(RL),其模拟驾驶环境需与现实条件匹配或有意设置得更具挑战性。这两种方法都依赖于对现实世界驾驶行为的高质量观测数据,而这些数据往往难以获取且成本高昂。单个车辆搭载的先进传感器可收集微观数据,但缺乏对周围环境背景的感知;反之,路侧传感器能捕捉交通流等宏观特征,却无法在微观层面将这些信息与特定车辆关联。受这种互补性的启发,我们提出一个框架:利用微观数据锚定观测到的车辆行为,从宏观观测重建未观测的微观状态,并学习一个共享策略——该策略在微观层面与部分观测到的轨迹和动作保持一致,在宏观层面当部署于整个车辆群体时能与目标交通统计数据对齐。这种经过约束和正则化的策略有助于在大规模应用中实现真实的交通流模式以及与人类驾驶员的安全协同。