The modern approaches for computer vision tasks significantly rely on machine learning, which requires a large number of quality images. While there is a plethora of image datasets with a single type of images, there is a lack of datasets collected from multiple cameras. In this thesis, we introduce Paired Image and Video data from three CAMeraS, namely PIV3CAMS, aimed at multiple computer vision tasks. The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras: Canon D5 Mark IV, Huawei P20, and ZED stereo camera. The dataset includes various indoor and outdoor scenes from different locations in Zurich (Switzerland) and Cheonan (South Korea). Some of the computer vision applications that can benefit from the PIV3CAMS dataset are image/video enhancement, view interpolation, image matching, and much more. We provide a careful explanation of the data collection process and detailed analysis of the data. The second part of this thesis studies the usage of depth information in the view synthesizing task. In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically. Through extensive experiments, we show that the effect of depth is crucial in small view changes. Finally, we apply our model to the introduced PIV3CAMS dataset to synthesize novel target views as an example application of PIV3CAMS.
翻译:现代计算机视觉任务的解决方案在很大程度上依赖于机器学习,而机器学习需要大量高质量图像。尽管存在大量单一类型的图像数据集,但缺乏从多个相机采集的数据集。在本论文中,我们介绍了来自三个相机的配对图像与视频数据,即PIV3CAMS,旨在服务于多种计算机视觉任务。PIV3CAMS数据集包含8385对图像和82对视频,分别由三台不同的相机拍摄:佳能D5 Mark IV、华为P20和ZED立体相机。数据集涵盖了苏黎世(瑞士)和天安(韩国)不同地点的各种室内外场景。能够受益于PIV3CAMS数据集的计算机视觉应用包括图像/视频增强、视图插值、图像匹配等。我们对数据采集过程进行了细致说明,并对数据进行了详细分析。本论文的第二部分研究了深度信息在视图合成任务中的使用。除了复现当前最先进的算法外,我们还研究了几种提出的、以几何方式整合深度信息的替代模型。通过大量实验,我们证明了深度信息在小视角变化中的关键作用。最后,我们将模型应用于所引入的PIV3CAMS数据集,以合成新的目标视图作为PIV3CAMS的一个示例应用。