Autonomous agents interact with other agents of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent only accepts offers that improve its utility. We present an algorithm for the offering agent to estimate the responding agent's gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm's goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. We show that, after a finite number of consecutively rejected offers, the responding agent is at a near-optimal state, or the agents' gradients are closely aligned. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers.
翻译:自主智能体与偏好未知的其他智能体交互,以共享其环境中的资源。我们研究了一种顺序交易机制,用于在有限资源类别集合中,两个贪婪理性智能体顺序交易资源的场景。每个智能体具有一个效用函数,该函数取决于其在每个类别中拥有的资源量。提出交易的智能体在不知道响应智能体效用函数的情况下提出交易提议以提升自身效用,而响应智能体仅接受能提升其效用的提议。我们提出了一种算法,使提议智能体能够估计响应智能体的梯度(偏好),并根据先前接受或拒绝的响应来制定提议。该算法的目标是达到帕累托最优的资源分配状态,同时确保每次被接受的交易后两个智能体的效用均得到提升。我们证明,在连续有限次被拒绝的提议后,响应智能体将处于接近最优的状态,或者两个智能体的梯度将高度对齐。我们在连续和离散交易场景中将所提算法与多种基线方法进行比较,结果表明该算法能以更少的提议次数提升社会福利。