Usage-Specific Survival Modeling Based on Operational Data and Neural Networks

Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data.

翻译：精确预测组件故障时间对于规划维护至关重要，而通过建模这些故障时间的分布，生存模型在此背景下展现出特别的价值。所提出的方法基于常规的神经网络生存模型，这些模型利用在特定时间点持续收集并存储的数据（称为快照）进行训练。这类训练数据的一个重要特性是，同一设备可能包含多个快照，这导致标准的极大似然估计无法直接应用，因为数据不再独立。然而，本文证明，若数据采用所有设备快照时间相同的特定格式（称为均匀采样），则极大似然估计可适用并产生理想结果。在许多情况下，数据并非均匀采样，为此文中提出对数据进行重采样以使其均匀。数据集的采样密度是关键参数：需选择足够高的密度以保证良好结果，但过密会增加数据集规模，导致训练缓慢。为减少训练所需样本量，本文还提出一种技术：不在训练前一次性重采样整个数据集，而是在每个训练周期开始时随机重采样数据。所提方法在模拟数据集和启动电池故障的实验数据集上进行了评估。结果表明，若数据均匀采样，该方法能按预期运行并生成精确的生存模型。结果还显示，在每个周期随机重采样数据能有效缩减训练数据规模。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日