In response to growing FinTech competition and the need for improved operational efficiency, this research focuses on understanding the potential of advanced document analytics, particularly using multimodal models, in banking processes. We perform a comprehensive analysis of the diverse banking document landscape, highlighting the opportunities for efficiency gains through automation and advanced analytics techniques in the customer business. Building on the rapidly evolving field of natural language processing (NLP), we illustrate the potential of models such as LayoutXLM, a cross-lingual, multimodal, pre-trained model, for analyzing diverse documents in the banking sector. This model performs a text token classification on German company register extracts with an overall F1 score performance of around 80\%. Our empirical evidence confirms the critical role of layout information in improving model performance and further underscores the benefits of integrating image information. Interestingly, our study shows that over 75% F1 score can be achieved with only 30% of the training data, demonstrating the efficiency of LayoutXLM. Through addressing state-of-the-art document analysis frameworks, our study aims to enhance process efficiency and demonstrate the real-world applicability and benefits of multimodal models within banking.
翻译:针对金融科技竞争加剧及运营效率提升的需求,本研究聚焦于探究先进文档分析技术(特别是多模态模型)在银行流程中的应用潜力。通过对银行多样化文档格局的全面分析,我们揭示了在客户业务中通过自动化与高级分析技术实现效率提升的机遇。立足自然语言处理(NLP)领域的快速发展,本研究阐释了诸如LayoutXLM(一种跨语言、多模态的预训练模型)等模型在分析银行业不同文档中的潜力。该模型对德国公司登记摘录进行文本词元分类,整体F1得分约为80%。实证证据表明,布局信息对提升模型性能具有关键作用,并进一步证实了整合图像信息的优势。值得注意的是,本研究表明,仅使用30%的训练数据即可实现超过75%的F1得分,这凸显了LayoutXLM的高效性。通过梳理前沿文档分析框架,本研究旨在提升流程效率,并展示多模态模型在银行业的实际应用价值与优势。