Object tracking is a key challenge of computer vision with various applications that all require different architectures. Most tracking systems have limitations such as constraining all movement to a 2D plane and they often track only one object. In this paper, we present a new modular pipeline that calculates 3D trajectories of multiple objects. It is adaptable to various settings where multiple time-synced and stationary cameras record moving objects, using off the shelf webcams. Our pipeline was tested on the Table Setting Dataset, where participants are recorded with various sensors as they set a table with tableware objects. We need to track these manipulated objects, using 6 rgb webcams. Challenges include: Detecting small objects in 9.874.699 camera frames, determining camera poses, discriminating between nearby and overlapping objects, temporary occlusions, and finally calculating a 3D trajectory using the right subset of an average of 11.12.456 pixel coordinates per 3-minute trial. We implement a robust pipeline that results in accurate trajectories with covariance of x,y,z-position as a confidence metric. It deals dynamically with appearing and disappearing objects, instantiating new Extended Kalman Filters. It scales to hundreds of table-setting trials with very little human annotation input, even with the camera poses of each trial unknown. The code is available at https://github.com/LarsBredereke/object_tracking
翻译:物体追踪是计算机视觉领域的一项关键挑战,其应用场景多样且往往需要不同的架构。现有追踪系统大多存在局限,例如将运动约束在二维平面内,且通常仅能追踪单一物体。本文提出一种新型模块化流程,用于计算多个物体的三维轨迹。该流程适用于多台时间同步的固定摄像头记录运动物体的多种场景,并采用现成的网络摄像头实现。我们在Table Setting数据集上测试了该流程,该数据集记录了参与者使用餐具布置餐桌时被多种传感器采集的数据。我们需要使用6个RGB网络摄像头追踪这些被操纵的物体。面临的挑战包括:在9,874,699帧图像中检测小尺寸物体、确定摄像头位姿、区分邻近与重叠物体、处理临时遮挡,以及最终利用每次3分钟试验中平均11,124,456个像素坐标的合适子集计算三维轨迹。我们实现了一套鲁棒的流程,能够生成以x、y、z位置协方差作为置信度指标的精确轨迹。该流程通过动态实例化新的扩展卡尔曼滤波器,有效处理物体的出现与消失。即使每次试验的摄像头位姿未知,该流程仍能以极少量人工标注扩展到数百次餐桌布置试验。代码发布于https://github.com/LarsBredereke/object_tracking