Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.

翻译：生物医学知识以惊人的速度增长，其中大部分知识以科学出版物形式呈现。文本挖掘工具和方法代表了从半结构化与非结构化数据中自动提取隐藏模式和趋势的技术手段。在生物医学文本挖掘领域，基于文献的发现（LBD）是指自动发现原本出现在不同文献集合中的医学术语之间新关联的过程。LBD方法已被证实能够有效缩短隐藏在大量科学文献中潜在关联的发现时间。该过程的核心在于为疾病或症状等医学术语构建概念轮廓，并基于共享轮廓的统计显著性将其与药物和治疗方法建立关联。这一知识发现方法自1989年提出以来，至今仍是文本挖掘的核心任务。目前，基于ABC原则的开放发现和封闭发现两种方法在LBD过程中得到广泛探索。本综述从文本挖掘的一般性介绍切入，进而讨论生物医学文本挖掘，并介绍了MEDLINE、UMLS、MESH和SemMedDB等多种文献资源。随后简要阐述了LBD过程中的核心ABC原则及其两种关联方法——开放发现与封闭发现。本综述还通过评析Transformer模型和基于神经网络的LBD模型的作用及其未来发展方向，探讨了深度学习在LBD中的应用。最后，综述了LBD方法在生物医学领域产生的关键发现，并总结了当前局限性及未来研究方向。