Financial prediction from Monetary Policy Conference (MPC) calls is a new yet challenging task, which targets at predicting the price movement and volatility for specific financial assets by analyzing multimodal information including text, video, and audio. Although the existing work has achieved great success using cross-modal transformer blocks, it overlooks the potential external financial knowledge, the varying contributions of different modalities to financial prediction, as well as the innate relations among different financial assets. To tackle these limitations, we propose a novel Modal-Adaptive kNowledge-enhAnced Graph-basEd financial pRediction scheme, named MANAGER. Specifically, MANAGER resorts to FinDKG to obtain the external related knowledge for the input text. Meanwhile, MANAGER adopts BEiT-3 and Hidden-unit BERT (HuBERT) to extract the video and audio features, respectively. Thereafter, MANAGER introduces a novel knowledge-enhanced cross-modal graph that fully characterizes the semantic relations among text, external knowledge, video and audio, to adaptively utilize the information in different modalities, with ChatGLM2 as the backbone. Extensive experiments on a publicly available dataset Monopoly verify the superiority of our model over cutting-edge methods.
翻译:摘要:货币政策会议电话的金融预测是一项新颖且具有挑战性的任务,其目标是通过分析文本、视频和音频等多模态信息,预测特定金融资产的价格走势与波动性。尽管现有工作利用跨模态Transformer模块取得了显著成功,但其忽略了潜在的外部金融知识、不同模态对金融预测的差异化贡献,以及不同金融资产间的内在关联。为解决上述局限性,我们提出了一种名为MANAGER的新型自适应模态知识增强图网络金融预测方案。具体而言,MANAGER借助FinDKG获取输入文本的外部关联知识;同时采用BEiT-3和隐单元BERT(HuBERT)分别提取视频和音频特征。在此基础上,MANAGER引入了一种新型知识增强跨模态图,该图全面刻画文本、外部知识、视频和音频间的语义关系,以ChatGLM2为骨干网络自适应利用不同模态信息。在公开数据集Monopoly上的大量实验表明,本模型性能优于现有前沿方法。