Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial knowledge, the varying contributions of different modalities to financial prediction, as well as the innate relations among different financial assets. To tackle these limitations, we propose a novel Modal-Adaptive kNowledge-enhAnced Graph-basEd financial pRediction scheme, named MANAGER. Specifically, MANAGER resorts to FinDKG to obtain the external related knowledge for the input text. Meanwhile, MANAGER adopts BEiT-3 and Hidden-unit BERT (HuBERT) to extract the video and audio features, respectively. Thereafter, MANAGER introduces a novel knowledge-enhanced cross-modal graph that fully characterizes the semantic relations among text, external knowledge, video and audio, to adaptively utilize the information in different modalities, with ChatGLM2 as the backbone. Extensive experiments on a publicly available dataset Monopoly verify the superiority of our model over cutting-edge methods.
翻译:货币政策会议电话的金融预测是一项新颖且具有挑战性的任务,其目标是通过分析包含文本、视频和音频的多模态信息,预测特定金融资产的价格波动与波动率。尽管现有研究利用跨模态Transformer模块取得了显著成功,但仍忽略了潜在的外部金融知识、不同模态对金融预测的差异化贡献以及不同金融资产间的内在关联。为解决这些局限,我们提出了一种名为MANAGER的模态自适应知识增强图神经网络金融预测方案。具体而言,MANAGER借助FinDKG获取输入文本的外部相关知识,同时采用BEiT-3和隐藏单元BERT分别提取视频与音频特征。随后,MANAGER引入了一种新颖的知识增强跨模态图结构,通过ChatGLM2作为主干网络,充分建模文本、外部知识、视频和音频间的语义关系,从而实现不同模态信息的自适应利用。在公开数据集Monopoly上的大量实验证明,本模型优于当前最先进方法。