3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration. However, existing collaborative approaches predominantly depend on the results of one task to make predictions for the other, limiting effective collaboration. We argue that employing separate branches for 3DREC and 3DRES tasks enhances the model's capacity to learn specific information for each task, enabling them to acquire complementary knowledge. Thus, we propose the MCLN framework, which includes independent branches for 3DREC and 3DRES tasks. This enables dedicated exploration of each task and effective coordination between the branches. Furthermore, to facilitate mutual reinforcement between these branches, we introduce a Relative Superpoint Aggregation (RSA) module and an Adaptive Soft Alignment (ASA) module. These modules significantly contribute to the precise alignment of prediction results from the two branches, directing the module to allocate increased attention to key positions. Comprehensive experimental evaluation demonstrates that our proposed method achieves state-of-the-art performance on both the 3DREC and 3DRES tasks, with an increase of 2.05% in [email protected] for 3DREC and 3.96% in mIoU for 3DRES.
翻译:三维指代表达理解(3DREC)与分割(3DRES)具有重叠的目标,表明二者存在协同潜力。然而,现有的协同方法主要依赖一个任务的结果来预测另一个任务,限制了有效协作。我们认为,为3DREC和3DRES任务采用独立的分支能够增强模型学习各任务特定信息的能力,使其能够获取互补知识。因此,我们提出了MCLN框架,该框架包含用于3DREC和3DRES任务的独立分支。这使得能够对每个任务进行专门探索,并在分支之间实现有效协调。此外,为促进这些分支之间的相互增强,我们引入了相对超点聚合(RSA)模块和自适应软对齐(ASA)模块。这些模块显著促进了两分支预测结果的精确对齐,引导模块将更多注意力分配到关键位置。全面的实验评估表明,我们提出的方法在3DREC和3DRES任务上均实现了最先进的性能,其中3DREC的[email protected]提高了2.05%,3DRES的mIoU提高了3.96%。