Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.
翻译:智能驾驶系统可通过简单操作缓解交通拥堵,从而改善通勤时间、燃油成本等社会经济指标。然而,此类系统假设对自动驾驶车队具有精确控制能力,因未能考虑人类行为的不确定性而在实际应用中受限。分段常数(PC)策略通过结构化建模人类驾驶行为相似性,为密集场景中的交通拥堵缓解提供可遵循的行动建议。但PC策略假设所有驾驶者行为趋同。为此,我们基于PC策略提出一种新型驾驶特征条件化个性化残差策略PeRP,构建协作咨询系统。PeRP通过向驾驶者提供行为建议缓解交通拥堵。首先,采用变分自编码器以无监督方式推断驾驶者对指令的内在遵循特征;随后,基于推断特征的条件化策略调整PC策略动作,为驾驶者提供个性化建议。我们的系统在融合新型指令遵循驾驶模型的仿真环境中进行训练。实验表明,该方法在适应不同驾驶行为的同时成功缓解拥堵,平均速度较基线提升4%至22%。