Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data collection at a central location in the first place. To address this challenge, this paper introduces EdgeAlpha, a distributed algorithm for process discovery operating directly on sensor nodes and edge devices on a stream of real-time event data. Based on the Alpha Miner, EdgeAlpha tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. From this local view, each node in EdgeAlpha derives a partial footprint matrix, which we then merge at a central location, whenever we query the system to compute a process model. EdgeAlpha enables (a) scalable mining, as a node, for each event, only interacts with its predecessors and, when queried, only exchanges aggregates, i.e., partial footprint matrices, with the central location and (b) privacy preserving process mining, as nodes only store their own as well as predecessor and successor events. On the Sepsis Cases event log, for example, a node queries on average 18.7% of all nodes. For the Hospital Log, we can even reduce the overall querying to 3.87% of the nodes.
翻译:过程挖掘正超越传统事件日志的挖掘范畴,如今已涵盖例如物联网传感器产生的数据。此类传感器生成的数据体量和速度使得传统过程发现算法难以在传统事件日志中存储和挖掘这些数据。此外,隐私考量往往从根本上阻止在中心位置进行数据收集。为应对这一挑战,本文提出EdgeAlpha——一种直接运行于传感器节点和边缘设备上的分布式过程发现算法,可处理实时事件数据流。基于Alpha挖掘器,EdgeAlpha在事件被感知和记录的传感器节点上直接追踪每个事件及其前驱和后继事件。从这一局部视角出发,EdgeAlpha中的每个节点推导出部分足迹矩阵,当系统被查询以计算过程模型时,这些矩阵可在中心位置进行合并。EdgeAlpha实现了:(a)可扩展的挖掘——节点对每个事件仅与其前驱交互,且被查询时仅与中心位置交换聚合数据(即部分足迹矩阵);(b)隐私保护的过程挖掘——节点仅存储自身事件及其前驱和后继事件。以Sepsis病例事件日志为例,节点平均查询所有节点的18.7%;对于医院日志,我们甚至可将整体查询比例降低至节点的3.87%。