The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power up to a threshold, after which further increases have no effect. Adding invariant features to hidden representations does not impact separation power. Finally, block decomposition of hidden representations affects separability, with minimal components forming a hierarchy in separation power that provides a straightforward method for comparing the separation power of models.
翻译:机器学习模型的分辨能力指其区分不同输入的能力,常被用作模型表达能力的代理指标。事实上,了解模型族的分辨能力是获得细粒度普适性结果的必要条件。本文分析了等变神经网络(如卷积网络和置换不变网络)的分辨能力。我们首先完整刻画了给定架构所导出模型无法区分的输入特征。基于此结果,我们推导了超参数与架构选择(如激活函数、深度、隐藏层宽度和表示类型)如何影响可分离性。值得注意的是,包括ReLU和sigmoid在内的所有非多项式激活函数在表达能力上等价,且均能达到最大分辨能力。深度在达到阈值前能提升分辨能力,超过该阈值后继续增加深度不再产生效果。在隐藏表示中添加不变特征不会影响分辨能力。最后,隐藏表示的块分解会影响可分离性,其中最小分量形成了分辨能力的层次结构,这为比较模型分辨能力提供了一种直接方法。