Deep learning models like U-Net and its variants, have established state-of-the-art performance in edge detection tasks and are used by Generative AI services world-wide for their image generation models. However, their decision-making processes remain opaque, operating as "black boxes" that obscure the rationale behind specific boundary predictions. This lack of transparency is a critical barrier in safety-critical applications where verification is mandatory. To bridge the gap between high-performance deep learning and interpretable logic, we propose the Rule-Based Spatial Mixture-of-Experts U-Net (sMoE U-Net). Our architecture introduces two key innovations: (1) Spatially-Adaptive Mixture-of-Experts (sMoE) blocks integrated into the decoder skip connections, which dynamically gate between "Context" (smooth) and "Boundary" (sharp) experts based on local feature statistics; and (2) a Takagi-Sugeno-Kang (TSK) Fuzzy Head that replaces the standard classification layer. This fuzzy head fuses deep semantic features with heuristic edge signals using explicit IF-THEN rules. We evaluate our method on the BSDS500 benchmark, achieving an Optimal Dataset Scale (ODS) F-score of 0.7628, effectively matching purely deep baselines like HED (0.7688) while outperforming the standard U-Net (0.7437). Crucially, our model provides pixel-level explainability through "Rule Firing Maps" and "Strategy Maps," allowing users to visualize whether an edge was detected due to strong gradients, high semantic confidence, or specific logical rule combinations.
翻译:U-Net及其变体等深度学习模型已在边缘检测任务中达到最先进的性能,并被全球生成式人工智能服务广泛用于其图像生成模型。然而,这些模型的决策过程仍不透明,其运作如同"黑箱",掩盖了特定边界预测背后的逻辑依据。在必须进行验证的安全关键应用中,这种透明度的缺失构成了关键障碍。为弥合高性能深度学习与可解释逻辑之间的鸿沟,我们提出了基于规则的空间专家混合U-Net(sMoE U-Net)。该架构引入两大创新:(1)集成于解码器跳跃连接中的空间自适应专家混合(sMoE)模块,能够根据局部特征统计量动态选择"上下文"(平滑)与"边界"(锐化)专家;(2)取代标准分类层的Takagi-Sugeno-Kang(TSK)模糊头部。该模糊头部通过显式的IF-THEN规则将深层语义特征与启发式边缘信号相融合。我们在BSDS500基准上评估了所提方法,取得了0.7628的最优数据集尺度(ODS)F分数,在有效匹配HED(0.7688)等纯深度学习基线的同时,显著超越了标准U-Net(0.7437)。至关重要的是,本模型通过"规则触发图"与"策略图"提供像素级可解释性,使用户能够直观分析边缘检测是基于强梯度、高语义置信度还是特定逻辑规则组合的结果。