Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data collection at a central location in the first place. To address this challenge, this paper introduces EdgeAlpha, a distributed algorithm for process discovery operating directly on sensor nodes and edge devices on a stream of real-time event data. Based on the Alpha Miner, EdgeAlpha tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. From this local view, each node in EdgeAlpha derives a partial footprint matrix, which we then merge at a central location, whenever we query the system to compute a process model. EdgeAlpha enables (a) scalable mining, as a node, for each event, only interacts with its predecessors and, when queried, only exchanges aggregates, i.e., partial footprint matrices, with the central location and (b) privacy preserving process mining, as nodes only store their own as well as predecessor and successor events. On the Sepsis Cases event log, for example, a node queries on average 18.7% of all nodes. For the Hospital Log, we can even reduce the overall querying to 3.87% of the nodes.
翻译:流程挖掘正超越传统事件日志挖掘,如今已涵盖例如物联网(IoT)传感器数据源。此类传感器生成的数据体量与速度,使得传统流程发现算法在传统事件日志中存储和挖掘此类数据日益困难。此外,隐私考量往往从一开始就阻碍了在中心位置进行数据收集。为应对这一挑战,本文提出EdgeAlpha——一种直接在传感器节点和边缘设备上对实时事件数据流进行操作的分布式流程发现算法。基于Alpha Miner,EdgeAlpha直接在感知和记录事件的传感器节点上追踪每个事件及其前驱与后继事件。从这一局部视角出发,EdgeAlpha中的每个节点推导出部分足迹矩阵,当查询系统以计算流程模型时,我们在中心位置合并这些矩阵。EdgeAlpha实现了:(a)可扩展的挖掘,因为节点针对每个事件仅与其前驱节点交互,且在查询时仅与中心位置交换聚合数据(即部分足迹矩阵);(b)隐私保护的流程挖掘,因为节点仅存储自身及前驱与后继事件。例如,在脓毒症病例事件日志中,节点平均查询全部节点的18.7%。对于医院日志,我们甚至可将总体查询量降至节点的3.87%。