When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

We study same-source multi-view learning and adversarial robustness for next-day direction prediction with financial image representations. On Shanghai Gold Exchange (SGE) spot gold data (2005-2025), we construct two window-aligned views from each rolling window: an OHLCV-rendered price/volume chart and a technical-indicator matrix. To ensure reliable evaluation, we adopt leakage-resistant time-block splits with embargo and use Matthews correlation coefficient (MCC). We find that results depend strongly on the label-noise regime: we apply an ex-post minimum-movement filter that discards samples with realized next-day absolute return below tau to define evaluation subsets with reduced near-zero label ambiguity. This induces a non-monotonic data-noise trade-off that can reveal predictive signal but eventually increases variance as sample size shrinks; the filter is used for offline benchmark construction rather than an inference-time decision rule. In the stabilized subsets, fusion is regime dependent: early fusion by channel stacking can exhibit negative transfer, whereas late fusion with dual encoders and a fusion head provides the dominant clean-performance gains; cross-view consistency regularization has secondary, backbone-dependent effects. We further evaluate test-time L-infinity perturbations using FGSM and PGD under two threat scenarios: view-constrained attacks that perturb one view and joint attacks that perturb both. We observe severe vulnerability at tiny budgets with strong view asymmetry. Late fusion consistently improves robustness under view-constrained attacks, but joint attacks remain challenging and can still cause substantial worst-case degradation.

翻译：本研究探讨基于金融图像表示的下一个交易日方向预测中的同源多视图学习与对抗鲁棒性问题。以上海黄金交易所（SGE）现货黄金数据（2005-2025年）为基础，我们从每个滚动窗口构建两个窗口对齐的视图：OHLCV渲染的价格/成交量图表和技术指标矩阵。为确保评估可靠性，我们采用带禁运期的防泄漏时间区块划分方法，并使用马修斯相关系数（MCC）作为评估指标。研究发现结果高度依赖于标签噪声机制：我们应用事后最小波动过滤器，剔除实现的下一个交易日绝对收益率低于阈值τ的样本，从而定义出近零标签模糊性降低的评估子集。这引发了非单调的数据-噪声权衡关系——虽能揭示预测信号，但随着样本量缩减最终会增大方差；该过滤器仅用于离线基准构建，而非推理时决策规则。在稳定子集中，融合机制呈现状态依赖性：通过通道堆叠的早期融合可能产生负迁移，而采用双编码器与融合头的晚期融合则带来显著的纯净性能提升；跨视图一致性正则化具有次要的、依赖于主干网络的效果。我们进一步在两种威胁场景下使用FGSM和PGD评估测试时L∞扰动：仅扰动单个视图的视图约束攻击，以及同时扰动两个视图的联合攻击。实验观察到在微小攻击预算下存在严重的脆弱性，且视图间表现出强烈的不对称性。晚期融合在视图约束攻击下持续提升鲁棒性，但联合攻击仍具挑战性，仍可能导致显著的最坏情况性能退化。