In recent years, digital humans have been widely applied in augmented/virtual reality (A/VR), where viewers are allowed to freely observe and interact with the volumetric content. However, the digital humans may be degraded with various distortions during the procedure of generation and transmission. Moreover, little effort has been put into the perceptual quality assessment of digital humans. Therefore, it is urgent to carry out objective quality assessment methods to tackle the challenge of digital human quality assessment (DHQA). In this paper, we develop a novel no-reference (NR) method based on Transformer to deal with DHQA in a multi-task manner. Specifically, the front 2D projections of the digital humans are rendered as inputs and the vision transformer (ViT) is employed for the feature extraction. Then we design a multi-task module to jointly classify the distortion types and predict the perceptual quality levels of digital humans. The experimental results show that the proposed method well correlates with the subjective ratings and outperforms the state-of-the-art quality assessment methods.
翻译:近年来,数字人类在增强现实/虚拟现实(A/VR)中得到广泛应用,允许观众自由观察并与体积内容交互。然而,在生成与传输过程中,数字人类可能因各种失真而质量下降。此外,针对数字人类的感知质量评估研究甚少。因此,亟需开展客观质量评估方法以应对数字人类质量评估(DHQA)的挑战。本文基于Transformer提出一种新型无参考(NR)方法,以多任务方式处理DHQA问题。具体而言,将数字人类的前端二维投影渲染为输入,采用视觉Transformer(ViT)进行特征提取,进而设计多任务模块联合分类失真类型并预测数字人类的感知质量水平。实验结果表明,所提方法与主观评分高度吻合,性能优于当前最先进的质量评估方法。