Learning in the presence of missing data can result in biased predictions and poor generalizability, among other difficulties, which data imputation methods only partially address. In neural networks, activation functions significantly affect performance yet typical options (e.g., ReLU, Swish) operate only on feature values and do not account for missingness indicators or confidence scores. We propose Three-Channel Evolved Activations (3C-EA), which we evolve using Genetic Programming to produce multivariate activation functions f(x, m, c) in the form of trees that take (i) the feature value x, (ii) a missingness indicator m, and (iii) an imputation confidence score c. To make these activations useful beyond the input layer, we introduce ChannelProp, an algorithm that deterministically propagates missingness and confidence values via linear layers based on weight magnitudes, retaining reliability signals throughout the network. We evaluate 3C-EA and ChannelProp on datasets with natural and injected (MCAR/MAR/MNAR) missingness at multiple rates under identical preprocessing and splits. Results indicate that integrating missingness and confidence inputs into the activation search improves classification performance under missingness.
翻译:在缺失数据存在的情况下进行学习可能导致预测偏差和泛化能力差等困难,而数据插补方法仅能部分解决这些问题。在神经网络中,激活函数对性能有显著影响,但典型选项(如ReLU、Swish)仅对特征值进行操作,未考虑缺失指示符或置信度分数。我们提出三通道进化激活函数(3C-EA),通过遗传编程进化生成以树结构表示的多变量激活函数f(x, m, c),该函数接收三个输入:(i)特征值x,(ii)缺失指示符m,以及(iii)插补置信度c。为使这些激活函数在输入层之外发挥作用,我们提出ChannelProp算法,该算法基于权重幅值通过线性层确定性传播缺失状态和置信度值,从而在整个网络中保持可靠性信号。我们在相同预处理和划分条件下,对具有自然缺失及注入缺失(MCAR/MAR/MNAR)的多组数据集在不同缺失率下评估了3C-EA与ChannelProp。结果表明,在激活函数搜索中整合缺失状态与置信度输入能有效提升缺失数据下的分类性能。