Causal feature selection framework for stable soft sensor modeling based on time-delayed cross mapping

Soft sensor modeling plays a crucial role in process monitoring. Causal feature selection can enhance the performance of soft sensor models in industrial applications. However, existing methods ignore two critical characteristics of industrial processes. Firstly, causal relationships between variables always involve time delays, whereas most causal feature selection methods investigate causal relationships in the same time dimension. Secondly, variables in industrial processes are often interdependent, which contradicts the decorrelation assumption of traditional causal inference methods. Consequently, soft sensor models based on existing causal feature selection approaches often lack sufficient accuracy and stability. To overcome these challenges, this paper proposes a causal feature selection framework based on time-delayed cross mapping. Time-delayed cross mapping employs state space reconstruction to effectively handle interdependent variables in causality analysis, and considers varying causal strength across time delay. Time-delayed convergent cross mapping (TDCCM) is introduced for total causal inference, and time-delayed partial cross mapping (TDPCM) is developed for direct causal inference. Then, in order to achieve automatic feature selection, an objective feature selection strategy is presented. The causal threshold is automatically determined based on the model performance on the validation set, and the causal features are then selected. Two real-world case studies show that TDCCM achieves the highest average performance, while TDPCM improves soft sensor stability and performance in the worst scenario. The code is publicly available at https://github.com/dirge1/TDPCM.

翻译：软测量建模在过程监控中发挥着关键作用。因果特征选择能够提升软测量模型在工业应用中的性能。然而，现有方法忽略了工业过程的两个关键特性。首先，变量间的因果关系总是存在时滞，而大多数因果特征选择方法仅在同一时间维度上研究因果关系。其次，工业过程中的变量往往相互依赖，这与传统因果推断方法的去相关性假设相矛盾。因此，基于现有因果特征选择方法的软测量模型通常缺乏足够的精度和稳定性。为克服这些挑战，本文提出了一种基于时滞交叉映射的因果特征选择框架。时滞交叉映射利用状态空间重构有效处理因果分析中的相互依赖变量，并考虑因果强度随时滞的变化。本文引入了时滞收敛交叉映射（TDCCM）用于完全因果推断，并开发了时滞偏交叉映射（TDPCM）用于直接因果推断。随后，为实现自动特征选择，提出了一种客观的特征选择策略：基于模型在验证集上的性能自动确定因果阈值，进而选择因果特征。两个实际案例研究表明，TDCCM实现了最高的平均性能，而TDPCM在最差情况下提升了软测量的稳定性和性能。代码公开于 https://github.com/dirge1/TDPCM。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日