The financial sector, a pivotal force in economic development, increasingly uses the intelligent technologies such as natural language processing to enhance data processing and insight extraction. This research paper through a review process of the time span of 2018-2023 explores the use of text mining as natural language processing techniques in various components of the financial system including asset pricing, corporate finance, derivatives, risk management, and public finance and highlights the need to address the specific problems in the discussion section. We notice that most of the research materials combined probabilistic with vector-space models, and text-data with numerical ones. The most used technique regarding information processing is the information classification technique and the most used algorithms include the long-short term memory and bidirectional encoder models. The research noticed that new specific algorithms are developed and the focus of the financial system is mainly on asset pricing component. The research also proposes a path from engineering perspective for researchers who need to analyze financial text. The challenges regarding text mining perspective such as data quality, context-adaption and model interpretability need to be solved so to integrate advanced natural language processing models and techniques in enhancing financial analysis and prediction. Keywords: Financial System (FS), Natural Language Processing (NLP), Software and Text Engineering, Probabilistic, Vector-Space, Models, Techniques, TextData, Financial Analysis.
翻译:金融部门作为经济发展的关键驱动力,正日益采用自然语言处理等智能技术以增强数据处理与洞察提取能力。本研究通过梳理2018-2023年间的文献,探讨了文本挖掘作为自然语言处理技术在金融系统各组成部分(包括资产定价、公司金融、衍生品、风险管理及公共财政)中的应用,并在讨论部分着重指出需解决的具体问题。我们注意到,多数研究材料将概率模型与向量空间模型相结合,并将文本数据与数值数据相融合。在信息处理方面最常用的技术是信息分类技术,最常用的算法包括长短期记忆网络和双向编码器模型。研究发现,针对特定场景的新算法不断涌现,且金融系统的研究焦点主要集中在资产定价领域。本研究还从工程角度为需要分析金融文本的研究者提出了技术路径。为整合先进自然语言处理模型与技术以强化金融分析与预测,需解决文本挖掘视角下的数据质量、语境适应性和模型可解释性等关键挑战。关键词:金融系统(FS)、自然语言处理(NLP)、软件与文本工程、概率模型、向量空间、模型、技术、文本数据、金融分析。