Day-to-night unpaired image translation is important to downstream tasks but remains challenging due to large appearance shifts and the lack of direct pixel-level supervision. Existing methods often introduce semantic hallucinations, where objects from target classes such as traffic signs and vehicles, as well as man-made light effects, are incorrectly synthesized. These hallucinations significantly degrade downstream performance. We propose a novel framework that detects and suppresses hallucinations of target-class features during unpaired translation. To detect hallucination, we design a dual-head discriminator that additionally performs semantic segmentation to identify hallucinated content in background regions. To suppress these hallucinations, we introduce class-specific prototypes, constructed by aggregating features of annotated target-domain objects, which act as semantic anchors for each class. Built upon a Schrodinger Bridge-based translation model, our framework performs iterative refinement, where detected hallucination features are explicitly pushed away from class prototypes in feature space, thus preserving object semantics across the translation trajectory.Experiments show that our method outperforms existing approaches both qualitatively and quantitatively. On the BDD100K dataset, it improves mAP by 15.5% for day-to-night domain adaptation, with a notable 31.7% gain for classes such as traffic lights that are prone to hallucinations.
翻译:昼夜非配对图像翻译对于下游任务至关重要,但由于外观变化巨大且缺乏直接的像素级监督,该任务仍然具有挑战性。现有方法常常引入语义幻觉,即错误地合成了来自目标类别(如交通标志和车辆)的物体以及人造灯光效果。这些幻觉显著降低了下游任务的性能。我们提出了一种新颖的框架,用于在非配对翻译过程中检测并抑制目标类别特征的幻觉。为了检测幻觉,我们设计了一个双头判别器,该判别器额外执行语义分割以识别背景区域中的幻觉内容。为了抑制这些幻觉,我们引入了类别特定的原型,这些原型通过聚合标注的目标域物体特征构建而成,作为每个类别的语义锚点。我们的框架建立在基于薛定谔桥的翻译模型之上,执行迭代细化,其中检测到的幻觉特征在特征空间中被明确地推离类别原型,从而在整个翻译轨迹中保留物体语义。实验表明,我们的方法在定性和定量上均优于现有方法。在BDD100K数据集上,对于昼夜域适应任务,它将mAP提高了15.5%,对于交通灯等易产生幻觉的类别,更是获得了31.7%的显著增益。