This paper introduces multimodal conformal regression. Traditionally confined to scenarios with solely numerical input features, conformal prediction is now extended to multimodal contexts through our methodology, which harnesses internal features from complex neural network architectures processing images and unstructured text. Our findings highlight the potential for internal neural network features, extracted from convergence points where multimodal information is combined, to be used by conformal prediction to construct prediction intervals (PIs). This capability paves new paths for deploying conformal prediction in domains abundant with multimodal data, enabling a broader range of problems to benefit from guaranteed distribution-free uncertainty quantification.
翻译:本文提出了多模态保形回归方法。传统保形预测仅适用于纯数值输入特征场景,本研究通过利用处理图像与非结构化文本的复杂神经网络架构的内部特征,将保形预测扩展至多模态场景。我们的研究结果表明,从多模态信息融合的收敛点提取的神经网络内部特征,可被保形预测用于构建预测区间。这一能力为在富含多模态数据的领域部署保形预测开辟了新途径,使更广泛的问题能够受益于具备无分布保证的不确定性量化。