In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception. Our formu- lation is designed for dynamic scenes, consisting of moving objects or human actions, where the goal is to recognize the pose or actions. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm stores a set of tiered feature vectors. The tiered feature vectors are generated to effectively model conceptual information about a scene as well as an image decoder that transforms output feature maps into RGB images. Our technique leverages the information amongst both static and dynamic objects within a scene and is able to capture salient scene attributes of high altitude videos. We evaluate its performance on challenging datasets, including Okutama Action and UG2, and observe considerable improvement in accuracy over state of the art aerial perception algorithms.
翻译:本文提出一种新方法,旨在弥合无人机感知中合成数据与真实世界数据之间的领域差距。我们的公式专为包含运动物体或人类行为的动态场景设计,目标是识别姿态或动作。我们提出了K平面神经辐射场(K-Planes NeRF)的扩展算法,其中存储一组分层特征向量。这些分层特征向量有效建模场景的概念信息,并配合图像解码器将输出特征图转换为RGB图像。我们的技术充分挖掘场景中静态与动态物体的信息,能够捕捉高空视频的关键场景属性。在包括Okutama Action和UG2在内的挑战性数据集上评估性能,结果表明相较于现有最先进的空中感知算法,准确率显著提升。