We present a new learning-based framework S-3D-RCNN that can recover accurate object orientation in SO(3) and simultaneously predict implicit shapes for outdoor rigid objects from stereo RGB images. In contrast to previous studies that map local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric object orientation. This approach features a deep model that transforms perceived intensities to object part coordinates, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. To enable implicit shape estimation, the IGRs are further extended to model visible object surface with a point-based representation and explicitly addresses the unseen surface hallucination problem. Extensive experiments validate the effectiveness of the proposed IGRs and S-3D-RCNN achieves superior 3D scene understanding performance using existing and proposed new metrics on the KITTI benchmark. Code and pre-trained models will be available at this https URL.
翻译:我们提出了一种新的基于学习的框架S-3D-RCNN,该框架能够从立体RGB图像中恢复SO(3)中精确的物体朝向,同时预测室外刚性物体的隐式形状。与以往将局部外观映射到观测角度的研究不同,我们探索了一种渐进式方法,通过提取有意义的中间几何表示(IGRs)来估计自我中心的物体朝向。该方法采用深度模型,将感知到的强度转换为物体部件坐标,这些坐标被映射为编码相机坐标系中物体朝向的三维表示。为实现隐式形状估计,我们进一步扩展了IGRs,利用基于点的表示对可见物体表面进行建模,并明确解决了不可见表面幻觉问题。大量实验验证了所提出的IGRs的有效性,并且S-3D-RCNN在使用KITTI基准上的现有和新提出的指标时,实现了优越的三维场景理解性能。代码和预训练模型将在以下网址公开。