On Exploring Node-feature and Graph-structure Diversities for Node Drop Graph Pooling

A pooling operation is essential for effective graph-level representation learning, where the node drop pooling has become one mainstream graph pooling technology. However, current node drop pooling methods usually keep the top-k nodes according to their significance scores, which ignore the graph diversity in terms of the node features and the graph structures, thus resulting in suboptimal graph-level representations. To address the aforementioned issue, we propose a novel plug-and-play score scheme and refer to it as MID, which consists of a \textbf{M}ultidimensional score space with two operations, \textit{i.e.}, fl\textbf{I}pscore and \textbf{D}ropscore. Specifically, the multidimensional score space depicts the significance of nodes through multiple criteria; the flipscore encourages the maintenance of dissimilar node features; and the dropscore forces the model to notice diverse graph structures instead of being stuck in significant local structures. To evaluate the effectiveness of our proposed MID, we perform extensive experiments by applying it to a wide variety of recent node drop pooling methods, including TopKPool, SAGPool, GSAPool, and ASAP. Specifically, the proposed MID can efficiently and consistently achieve about 2.8\% average improvements over the above four methods on seventeen real-world graph classification datasets, including four social datasets (IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and COLLAB), and thirteen biochemical datasets (D\&D, PROTEINS, NCI1, MUTAG, PTC-MR, NCI109, ENZYMES, MUTAGENICITY, FRANKENSTEIN, HIV, BBBP, TOXCAST, and TOX21). Code is available at~\url{https://github.com/whuchuang/mid}.

翻译：池化操作对于有效的图级表示学习至关重要，其中节点丢弃池化已成为主流图池化技术之一。然而，当前的节点丢弃池化方法通常根据重要性得分保留top-k节点，忽略了节点特征和图结构方面的图多样性，从而导致图级表示次优。为解决上述问题，我们提出了一种新颖的即插即用得分方案，称为MID，它包含一个多维得分空间及两种操作，即翻转得分和丢弃得分。具体而言，多维得分空间通过多个标准刻画节点的重要性；翻转得分鼓励保留不相似的节点特征；丢弃得分促使模型关注多样的图结构，而非局限于显著的局部结构。为评估所提MID的有效性，我们通过将其应用于多种近期节点丢弃池化方法（包括TopKPool、SAGPool、GSAPool和ASAP）进行了大量实验。具体而言，所提出的MID在17个真实世界图分类数据集（包括四个社交数据集：IMDB-BINARY、IMDB-MULTI、REDDIT-BINARY和COLLAB，以及十三个生化数据集：D&D、PROTEINS、NCI1、MUTAG、PTC-MR、NCI109、ENZYMES、MUTAGENICITY、FRANKENSTEIN、HIV、BBBP、TOXCAST和TOX21）上，相较于上述四种方法，能够高效且稳定地实现平均约2.8%的提升。代码见~\url{https://github.com/whuchuang/mid}。