In this paper, we draw an analogy between processing natural languages and processing multivariate event streams from vehicles in order to predict $\textit{when}$ and $\textit{what}$ error pattern is most likely to occur in the future for a given car. Our approach leverages the temporal dynamics and contextual relationships of our event data from a fleet of cars. Event data is composed of discrete values of error codes as well as continuous values such as time and mileage. Modelled by two causal Transformers, we can anticipate vehicle failures and malfunctions before they happen. Thus, we introduce $\textit{CarFormer}$, a Transformer model trained via a new self-supervised learning strategy, and $\textit{EPredictor}$, an autoregressive Transformer decoder model capable of predicting $\textit{when}$ and $\textit{what}$ error pattern will most likely occur after some error code apparition. Despite the challenges of high cardinality of event types, their unbalanced frequency of appearance and limited labelled data, our experimental results demonstrate the excellent predictive ability of our novel model. Specifically, with sequences of $160$ error codes on average, our model is able with only half of the error codes to achieve $80\%$ F1 score for predicting $\textit{what}$ error pattern will occur and achieves an average absolute error of $58.4 \pm 13.2$h $\textit{when}$ forecasting the time of occurrence, thus enabling confident predictive maintenance and enhancing vehicle safety.
翻译:本文通过类比自然语言处理与车辆多元事件流处理,提出一种预测特定车辆未来最可能发生错误模式的时机与类型的方法。我们的方法充分利用了来自车队事件数据的时间动态特性与上下文关联性。事件数据由离散的错误代码值以及时间和里程等连续值组成。通过两个因果Transformer建模,我们能够在车辆故障发生前进行预测。为此,我们提出CarFormer——采用新型自监督学习策略训练的Transformer模型,以及EPredictor——能够预测特定错误代码出现后最可能发生的错误模式类型与时机的自回归Transformer解码器模型。尽管面临事件类型基数高、出现频率不均衡以及标注数据有限等挑战,实验结果表明我们提出的新模型具有卓越的预测能力。具体而言,在平均160个错误代码的序列中,仅需半数错误代码即可在预测错误模式类型时达到80%的F1分数,并在预测发生时间时取得58.4±13.2小时的平均绝对误差,从而为实现可靠的预测性维护和提升车辆安全性提供了支持。