The architecture, engineering and construction (AEC) sector extensively uses documents supporting product and process development. As part of this, organisations should handle big data of hundreds, or even thousands, of technical documents strongly linked together, including CAD design of industrial plants, equipment purchase orders, quality certificates, and part material analysis. However, analysing such records is daunting for users because it gets complicated to sift through hundreds of documents to establish valuable relationships. This paper addresses how knowledge extracted from linked engineering documents contributes to industrial digitalisation under IT/OT convergence. The proposed GraphLED is a system tasked with data processing, graph-based modelling, and colourful visualisation of related documents. The graph-based approach ensures an improved understanding of linked information because the graph structure offers a promising tool to model the underlying data properties of engineering documents. Preliminary system validation indicates quality improvements are possible in the OCR-based data (85.9% of ambiguous text data removed). This work has the potential to benefit the industry by improving the reliability and resilience of industrial production systems through automated summaries of large quantities of documents and their linkage.
翻译:摘要:建筑、工程与施工(AEC)领域广泛使用支持产品和流程开发的文档。在此过程中,组织需处理强链接的海量技术文档大数据,包括工业工厂的CAD设计、设备采购订单、质量证书及零件材料分析。然而,分析此类记录对用户而言极具挑战性,因为在数百份文档中筛选并建立有价值的关系十分复杂。本文探讨了在IT/OT融合背景下,从链接工程文档中提取的知识如何助力工业数字化。所提出的GraphLED系统负责数据处理、基于图的建模以及相关文档的彩色可视化。基于图的方法能够增强对链接信息的理解,因为图结构为建模工程文档的底层数据属性提供了有力工具。初步系统验证表明,基于OCR的数据质量可得到提升(去除了85.9%的歧义文本数据)。本工作有望通过自动汇总大量文档及其关联关系,提高工业生产系统的可靠性与韧性,从而为工业界带来效益。