Influence Maximization with Fairness at Scale (Extended Version)

In this paper, we revisit the problem of influence maximization with fairness, which aims to select k influential nodes to maximise the spread of information in a network, while ensuring that selected sensitive user attributes are fairly affected, i.e., are proportionally similar between the original network and the affected users. Recent studies on this problem focused only on extremely small networks, hence the challenge remains on how to achieve a scalable solution, applicable to networks with millions or billions of nodes. We propose an approach that is based on learning node representations for fair spread from diffusion cascades, instead of the social connectivity s.t. we can deal with very large graphs. We propose two data-driven approaches: (a) fairness-based participant sampling (FPS), and (b) fairness as context (FAC). Spread related user features, such as the probability of diffusing information to others, are derived from the historical information cascades, using a deep neural network. The extracted features are then used in selecting influencers that maximize the influence spread, while being also fair with respect to the chosen sensitive attributes. In FPS, fairness and cascade length information are considered independently in the decision-making process, while FAC considers these information facets jointly and considers correlations between them. The proposed algorithms are generic and represent the first policy-driven solutions that can be applied to arbitrary sets of sensitive attributes at scale. We evaluate the performance of our solutions on a real-world public dataset (Sina Weibo) and on a hybrid real-synthethic dataset (Digg), which exhibit all the facets that we exploit, namely diffusion network, diffusion traces, and user profiles. These experiments show that our methods outperform the state-the-art solutions in terms of spread, fairness, and scalability.

翻译：本文重新审视了带公平性约束的影响最大化问题，旨在选择k个有影响力节点以最大化网络信息传播，同时确保所选敏感用户属性受到公平影响，即受影响用户与原始网络中该属性的比例相似。现有研究仅关注极小规模网络，如何实现可扩展至百万级甚至十亿级节点的解决方案仍是挑战。我们提出一种基于扩散级联学习节点表示以实现公平传播的方法，而非依赖社交连接性，从而能够处理超大规模图。我们提出两种数据驱动方法：（a）公平性参与者采样（FPS），（b）公平性作为上下文（FAC）。通过深度神经网络从历史信息级联中提取传播相关用户特征（如信息扩散至他人的概率）。利用提取的特征选择最大化影响力传播且对选定敏感属性公平的影响者。FPS在决策过程中独立考虑公平性与级联长度信息，而FAC则联合处理这些信息维度并考虑其相关性。所提算法具有通用性，是首批可大规模应用于任意敏感属性集的策略驱动解决方案。我们在展示扩散网络、扩散轨迹和用户画像等所有可利用特征的真实公共数据集（新浪微博）和混合真实-合成数据集（Digg）上评估性能。实验表明，我们的方法在传播范围、公平性和可扩展性方面均优于现有最优方案。