The objective of Classic Referring Expression Comprehension (REC) is to produce a bounding box corresponding to the object mentioned in a given textual description. Commonly, existing datasets and techniques in classic REC are tailored for expressions that pertain to a single target, meaning a sole expression is linked to one specific object. Expressions that refer to multiple targets or involve no specific target have not been taken into account. This constraint hinders the practical applicability of REC. This study introduces a new benchmark termed as Generalized Referring Expression Comprehension (GREC). This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions. The design of GREC and gRefCOCO ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a GREC method implementation code, and GREC evaluation code are available at https://github.com/henghuiding/gRefCOCO.
翻译:经典指代表达式理解(REC)的目标是根据给定文本描述生成对应物体的边界框。通常,经典REC中的现有数据集和技术针对涉及单一目标的表达式,即一个表达式对应唯一特定物体。涉及多个目标或不涉及特定目标的表达式尚未被考虑。这一限制阻碍了REC的实际应用。本研究提出一项新基准,称为广义指代表达式理解(GREC)。该基准通过允许表达式描述任意数量的目标物体,扩展了经典REC。为实现这一目标,我们构建了首个大规模GREC数据集gRefCOCO。该数据集涵盖多种表达式:指代多个目标的表达式、无特定目标的表达式以及单一目标表达式。GREC与gRefCOCO的设计确保了与经典REC的平滑兼容。所提出的gRefCOCO数据集、GREC方法实现代码及GREC评估代码均可在https://github.com/henghuiding/gRefCOCO获取。