Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data.
翻译:精确预测组件故障时间对于规划维护至关重要,而通过建模这些故障时间的分布,生存模型在此背景下展现出特别的价值。所提出的方法基于常规的神经网络生存模型,这些模型利用在特定时间点持续收集并存储的数据(称为快照)进行训练。这类训练数据的一个重要特性是,同一设备可能包含多个快照,这导致标准的极大似然估计无法直接应用,因为数据不再独立。然而,本文证明,若数据采用所有设备快照时间相同的特定格式(称为均匀采样),则极大似然估计可适用并产生理想结果。在许多情况下,数据并非均匀采样,为此文中提出对数据进行重采样以使其均匀。数据集的采样密度是关键参数:需选择足够高的密度以保证良好结果,但过密会增加数据集规模,导致训练缓慢。为减少训练所需样本量,本文还提出一种技术:不在训练前一次性重采样整个数据集,而是在每个训练周期开始时随机重采样数据。所提方法在模拟数据集和启动电池故障的实验数据集上进行了评估。结果表明,若数据均匀采样,该方法能按预期运行并生成精确的生存模型。结果还显示,在每个周期随机重采样数据能有效缩减训练数据规模。