Fairness is an important consideration for dynamic resource allocation in multi-agent systems. Many existing methods treat fairness as a one-shot problem without considering temporal dynamics, which misses the nuances of accumulating inequalities over time. Recent approaches overcome this limitation by tracking allocations over time, assuming perfect recall of all past utilities. While the former neglects long-term equity, the latter introduces a critical challenge: the augmented state space required to track cumulative utilities grows unboundedly with time, hindering the scalability and convergence of learning algorithms. Motivated by behavioral insights that human fairness judgments discount distant events, we introduce a framework for temporal fairness that incorporates past-discounting into the learning problem. This approach offers a principled interpolation between instantaneous and perfect-recall fairness. Our central contribution is a past-discounted framework for memory tracking and a theoretical analysis of fairness memories, showing past-discounting guarantees a bounded, horizon-independent state space, a property that we prove perfect-recall methods lack. This result unlocks the ability to learn fair policies tractably over arbitrarily long horizons. We formalize this framework, demonstrate its necessity with experiments showing that perfect recall fails where past-discounting succeeds, and provide a clear path toward building scalable and equitable resource allocation systems.
翻译:在多智能体系统中,公平性是动态资源分配的重要考量因素。现有方法多将公平性视为单次决策问题,忽略了时间动态性,因而未能捕捉不平等随时间累积的细微差异。近期研究通过追踪历史分配记录来克服这一局限,但假设需完整记忆所有过往效用。前者忽视了长期公平性,后者则引入关键挑战:追踪累积效用所需的扩展状态空间随时间无限增长,阻碍了学习算法的可扩展性与收敛性。受行为科学中人类公平判断会折减远期事件影响的启示,我们提出一种将历史折扣机制融入学习过程的时序公平性框架。该方法在瞬时公平与完全记忆公平之间建立了理论化的插值方案。我们的核心贡献在于提出具有历史折扣的记忆追踪框架,并对公平性记忆进行理论分析,证明历史折扣能保证状态空间的有界性且与时域无关——这一性质被证明是完美记忆方法所不具备的。该结论使得在任意长时域下可扩展地学习公平策略成为可能。我们通过形式化该框架,结合实验证明完全记忆方法失效时历史折扣仍能有效运作,并为此类系统的可扩展性与公平性建设提供了清晰路径。