Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.
翻译:图像描述(IC)系统旨在为图像中的显著对象生成文本描述。近年来,IC系统已日益融入日常生活,例如为视障人士提供辅助功能以及为微软PowerPoint生成描述。然而,即便是最先进的IC系统(如微软Azure认知服务)与算法(如OFA)也可能产生错误描述,导致重要对象的错误描述、误解乃至人身安全威胁。现有测试方法要么无法处理IC系统输出的复杂形式(即自然语言句子),要么生成不自然的图像作为测试用例。为解决这些问题,我们提出递归对象熔融(Rome)——一种用于验证IC系统的新型蜕变测试方法。与现有通过插入对象生成测试用例(易导致图像不自然)的方法不同,Rome采用熔融(即移除并修复)对象的方式。Rome假设:经过对象熔融后生成的图像,其描述中的对象集合应包含于原图像描述的集合中。给定一张图像,Rome可递归移除其对象以生成不同的图像对。我们使用Rome测试了一个广泛采用的图像描述API和四种最先进(SOTA)算法。结果表明,Rome生成的测试用例比SOTA IC测试方法更自然,且其自然度与原图相当。同时,通过使用226张种子图像生成测试对,Rome以高精确度(86.47%-92.17%)报告了总计9121个错误问题。此外,我们进一步利用Rome生成的测试用例重新训练Oscar模型,使其在多个评估指标上性能得到提升。