Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real-world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre-attached payloads or rely on specialized grippers, leaving versatile end-to-end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study \textbf{A}utonomous \textbf{A}erial Manipulation via \textbf{Co}ntextual \textbf{Co}ntrastive Meta Reinforcement Learning (\textbf{\textit{Aco2}}), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle-equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload-dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task-relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, \textit{Aco2} can be directly deployed on a physical quadrotor without real-world fine-tuning.
翻译:无人机正越来越多地部署于物流、服务机器人及其他实际应用中,这催生了对其自主载荷抓取与递送功能的日益增长的需求。现有方法通常假设载荷已预先固定,或依赖专用夹持器,导致通用的端到端空中递送问题尚未得到充分解决——不同载荷会引发高度变化的飞行动力学特性,要求单一策略在不依赖人工校准或明确系统辨识的情况下实现在线自适应。为此,我们研究了基于上下文对比元强化学习的自主空中操纵(Aco2),这是一种全自主空中递送场景:配备轻量化挂钩的四旋翼飞行器在随机位置之间持续执行抓取、运输和递送多种带手柄载荷的任务,全程无需人工干预。首先,我们设计了一个上下文观测编码器,用于从近期交互历史中推断出紧凑的潜在上下文表征,使策略能够在线适应依赖载荷的动力学特性。为进一步提升该上下文表征的质量,我们引入了一个对比学习目标函数,通过围绕任务相关变化构建上下文嵌入结构,从而在无需显式系统辨识的前提下,增强策略对多样化载荷的泛化能力。完全在仿真环境中通过广泛域随机化训练的Aco2能够直接部署于实体四旋翼飞行器,无需任何真实环境微调。