Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and mappings for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results.
翻译:工业4.0与物联网技术从工厂生产中解锁了前所未有的数据量,在数据规模和多样性方面带来了大数据挑战。在此背景下,诸如云系统等分布式计算解决方案被用于并行化数据处理并缩短计算时间。随着云系统日益普及,越来越多原本非云专家(如数据科学家、领域专家)的用户需要在其上部署解决方案。然而,既要满足云系统用户的高需求,又要避免过长的培训时间,这绝非易事。为此,我们提出了SemCloud——一种语义增强型云系统,将云系统与语义技术和机器学习相结合。SemCloud依赖领域本体和映射进行数据集成,并在分布式计算节点上并行执行语义数据集成与数据分析。此外,SemCloud采用自适应Datalog规则和机器学习实现自动化资源配置,使非云专家也能使用云系统。该系统已在包含数百万数据、数千次重复运行及领域用户的工业用例中进行了评估,结果令人满意。