We present an explainable, bias-aware generative framework that unifies cross-modal attention fusion, Grad-CAM++ attribution, and a Reveal-to-Revise feedback loop within a single training paradigm. The architecture couples a conditional attention WGAN GP with bias regularization and iterative local explanation feedback and is evaluated on Multimodal MNIST and Fashion MNIST for image generation and subgroup auditing, as well as a toxic/non-toxic text classification benchmark. All experiments use stratified 80/20 splits, validation-based early stopping, and AdamW with cosine annealing, and results are averaged over three random seeds. The proposed model achieves 93.2% accuracy, a 91.6% F1-score, and a 78.1% IoU-XAI on the multimodal benchmark, outperforming all baselines across every metric, while adversarial training restores 73 to 77% robustness on Fashion MNIST. Ablation studies confirm that fusion, Grad-CAM++, and bias feedback each contribute independently to final performance, with explanations improving structural coherence (SSIM = 88.8%, NMI = 84.9%) and fairness across protected subgroups. These results establish attribution and guided generative learning as a practical and trustworthy approach for high-stakes AI applications.
翻译:我们提出了一种可解释且具有偏差感知能力的生成框架,该框架在单一训练范式内统一了跨模态注意力融合、Grad-CAM++归因及“揭示-修正”反馈循环。该架构将条件注意力WGAN GP与偏差正则化及迭代局部解释反馈相结合,并在多模态MNIST和Fashion MNIST数据集上进行了图像生成与子群审计评估,同时采用一个有毒/无毒文本分类基准进行验证。所有实验均采用分层80/20划分、基于验证的早停法以及带余弦退火的AdamW优化器,结果基于三个随机种子的平均值。所提模型在多模态基准上达到了93.2%的准确率、91.6%的F1分数和78.1%的IoU-XAI,在所有指标上均优于基线方法,而对抗训练在Fashion MNIST上恢复了73%至77%的鲁棒性。消融研究证实,融合机制、Grad-CAM++及偏差反馈各自独立地提升了最终性能,其中解释机制增强了结构一致性(SSIM = 88.8%,NMI = 84.9%)及受保护子群间的公平性。这些结果表明,归因与引导式生成学习为高风险AI应用提供了一种实用且可信的方法。