The aim of this paper is to introduce a quantum fusion mechanism for multimodal learning and to establish its theoretical and empirical potential. The proposed method, called the Quantum Fusion Layer (QFL), replaces classical fusion schemes with a hybrid quantum-classical procedure that uses parameterized quantum circuits to learn entangled feature interactions without requiring exponential parameter growth. Supported by quantum signal processing principles, the quantum component efficiently represents high-order polynomial interactions across modalities with linear parameter scaling, and we provide a separation example between QFL and low-rank tensor-based methods that highlights potential quantum query advantages. In simulation, QFL consistently outperforms strong classical baselines on small but diverse multimodal tasks, with particularly marked improvements in high-modality regimes. These results suggest that QFL offers a fundamentally new and scalable approach to multimodal fusion that merits deeper exploration on larger systems.
翻译:本文旨在为多模态学习引入一种量子融合机制,并论证其理论与实证潜力。所提出的方法称为量子融合层(QFL),它通过一种混合量子-经典流程取代了经典的融合方案,该流程利用参数化量子电路来学习纠缠特征交互,而无需指数级参数增长。在量子信号处理原理的支持下,量子组件能以线性参数缩放高效地表示跨模态的高阶多项式交互,并且我们提供了一个QFL与基于低秩张量的方法之间的分离示例,突显了潜在的量子查询优势。在模拟实验中,QFL在小型但多样化的多模态任务上持续优于强大的经典基线方法,在高模态数量场景下改进尤为显著。这些结果表明,QFL为多模态融合提供了一种根本性的、可扩展的新途径,值得在更大规模的系统上进行更深入的探索。