Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and Opportunities

Rapid advances in Machine Learning (ML) have triggered new trends in Autonomous Vehicles (AVs). ML algorithms play a crucial role in interpreting sensor data, predicting potential hazards, and optimizing navigation strategies. However, achieving full autonomy in cluttered and complex situations, such as intricate intersections, diverse sceneries, varied trajectories, and complex missions, is still challenging, and the cost of data labeling remains a significant bottleneck. The adaptability and robustness of humans in complex scenarios motivate the inclusion of humans in ML process, leveraging their creativity, ethical power, and emotional intelligence to improve ML effectiveness. The scientific community knows this approach as Human-In-The-Loop Machine Learning (HITL-ML). Towards safe and ethical autonomy, we present a review of HITL-ML for AVs, focusing on Curriculum Learning (CL), Human-In-The-Loop Reinforcement Learning (HITL-RL), Active Learning (AL), and ethical principles. In CL, human experts systematically train ML models by starting with simple tasks and gradually progressing to more difficult ones. HITL-RL significantly enhances the RL process by incorporating human input through techniques like reward shaping, action injection, and interactive learning. AL streamlines the annotation process by targeting specific instances that need to be labeled with human oversight, reducing the overall time and cost associated with training. Ethical principles must be embedded in AVs to align their behavior with societal values and norms. In addition, we provide insights and specify future research directions.

翻译：机器学习的快速发展推动了自动驾驶技术的新趋势。机器学习算法在解析传感器数据、预测潜在危险及优化导航策略方面发挥着关键作用。然而，在复杂场景（如错综交叉路口、多样化景观、多变轨迹及复合任务）中实现完全自主仍面临挑战，数据标注成本仍是显著瓶颈。人类在复杂情境中的适应性与鲁棒性促使我们将人类纳入机器学习流程，借助其创造力、伦理判断力与情感智能来提升机器学习效能。科学界将这种方法称为人机协同机器学习。为实现安全且符合伦理的自主驾驶，本文系统综述了面向自动驾驶的人机协同机器学习方法，重点关注课程学习、人机协同强化学习、主动学习及伦理原则。在课程学习中，人类专家通过从简单任务开始并逐步过渡至复杂任务的方式系统化训练机器学习模型。人机协同强化学习通过奖励塑造、动作注入和交互学习等技术融入人类输入，显著增强强化学习过程。主动学习通过筛选需要人工标注的特定样本，优化标注流程，从而降低整体训练时间与成本。必须将伦理原则嵌入自动驾驶系统，使其行为符合社会价值观与规范。此外，本文还提出了未来研究方向的见解与具体建议。