Learning in the presence of missing data can result in biased predictions and poor generalizability, among other difficulties, which data imputation methods only partially address. In neural networks, activation functions significantly affect performance yet typical options (e.g., ReLU, Swish) operate only on feature values and do not account for missingness indicators or confidence scores. We propose Three-Channel Evolved Activations (3C-EA), which we evolve using Genetic Programming to produce multivariate activation functions f(x, m, c) in the form of trees that take (i) the feature value x, (ii) a missingness indicator m, and (iii) an imputation confidence score c. To make these activations useful beyond the input layer, we introduce ChannelProp, an algorithm that deterministically propagates missingness and confidence values via linear layers based on weight magnitudes, retaining reliability signals throughout the network. We evaluate 3C-EA and ChannelProp on datasets with natural and injected (MCAR/MAR/MNAR) missingness at multiple rates under identical preprocessing and splits. Results indicate that integrating missingness and confidence inputs into the activation search improves classification performance under missingness.
翻译:论文摘要:在存在缺失数据的情况下进行学习可能导致预测偏差和泛化能力下降等问题,而数据插补方法仅能部分解决这些困难。在神经网络中,激活函数显著影响性能,但典型选项(如ReLU、Swish)仅对特征值进行操作,未考虑缺失指示符或置信度分数。我们提出三通道进化激活函数(3C-EA),通过遗传编程进化生成形如f(x, m, c)的多元激活函数树结构,其输入包含:(i)特征值x、(ii)缺失指示符m、(iii)插补置信度分数c。为使这些激活函数在输入层之外仍具实用性,我们提出ChannelProp算法,该算法通过基于权重幅度的线性层确定性传播缺失值与置信度分数,在网络中全程保留可靠性信号。我们在相同预处理与数据划分条件下,对自然缺失及多缺失率下人工注入(MCAR/MAR/MNAR)缺失的数据集评估了3C-EA与ChannelProp。结果表明,将缺失信息与置信度输入纳入激活搜索能够提升缺失场景下的分类性能。