Semantic segmentation is an important and well-known task in the field of computer vision, in which we attempt to assign a corresponding semantic class to each input element. When it comes to semantic segmentation of 2D images, the input elements are pixels. On the other hand, the input can also be a point cloud, where one input element represents one point in the input point cloud. By the term point cloud, we refer to a set of points defined by spatial coordinates with respect to some reference coordinate system. In addition to the position of points in space, other features can also be defined for each point, such as RGB components. In this paper, we conduct semantic segmentation on the S3DIS dataset, where each point cloud represents one room. We train models on the S3DIS dataset, namely PointCNN, PointNet++, Cylinder3D, Point Transformer, and RepSurf. We compare the obtained results with respect to standard evaluation metrics for semantic segmentation and present a comparison of the models based on inference speed.
翻译:语义分割是计算机视觉领域中一项重要且广为人知的任务,旨在为每个输入元素分配对应的语义类别。在二维图像的语义分割中,输入元素是像素。而输入也可以是点云,此时每个输入元素代表点云中的一个点。所谓点云,是指相对于某个参考坐标系,由空间坐标定义的一组点。除点的空间位置外,还可为每个点定义其他特征,如RGB分量。本文在S3DIS数据集上执行语义分割,其中每个点云代表一个房间。我们在S3DIS数据集上训练了PointCNN、PointNet++、Cylinder3D、Point Transformer和RepSurf等模型。基于语义分割的标准评价指标对比了所得结果,并根据推理速度呈现了模型间的比较。