This paper presents a method to explain the internal representation structure of a neural network for image generation. Specifically, our method disentangles primitive feature components from the intermediate-layer feature of the neural network, which ensures that each feature component is exclusively used to generate a specific set of image regions. In this way, the generation of the entire image can be considered as the superposition of different pre-encoded primitive regional patterns, each being generated by a feature component. We find that the feature component can be represented as an OR relationship between the demands for generating different image regions, which is encoded by the neural network. Therefore, we extend the Harsanyi interaction to represent such an OR interaction to disentangle the feature component. Experiments show a clear correspondence between each feature component and the generation of specific image regions.
翻译:本文提出了一种解释图像生成神经网络内部表征结构的方法。具体而言,我们的方法从神经网络中间层特征中解耦出基元特征分量,确保每个特征分量专门用于生成特定的一组图像区域。通过这种方式,整张图像的生成可视为不同预编码基元区域模式的叠加,每种模式均由一个特征分量生成。我们发现,特征分量可表示为生成不同图像区域需求之间的"或"(OR)关系,这种关系由神经网络编码实现。因此,我们扩展了Harsanyi交互理论来表示此类"或"交互,从而实现特征分量的解耦。实验表明,每个特征分量与特定图像区域的生成存在明确的对应关系。