Vehicle-to-vehicle (V2V) energy trading enables decentralized peer-to-peer energy exchange among electric vehicles (EVs), reducing grid dependency while monetizing surplus capacity. However, coordinating self-interested EV agents with diverse charging needs and uncertain arrival-departure schedules remains challenging. Existing approaches either require centralized optimization with computational limitations or lack fairness guarantees. This paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, namely Nash-MADDPG, for incentive-aligned V2V energy trading. Nash bargaining determines efficient bilateral pricing, while Nash-guided price proximity rewards align agent learning toward bargaining-optimal strategies. Evaluation over 30-day continuous operation demonstrates an improvement of 61.6% in social welfare and 62.9% improvement in trading volume over Double Auction, while achieving superior fairness, such as 40.1% improvement in Jain's index. Testing across 6-100 agents over a 30-day horizon with continuous vehicle turnover confirms scalability across population size and empirically stable pricing near the Nash Bargaining benchmark.
翻译:车对车(V2V)能源交易实现了电动汽车(EV)间去中心化的点对点能源交换,在减少电网依赖的同时将剩余容量货币化。然而,协调具有多样充电需求及不确定到离时刻的利己EV智能体仍具挑战性。现有方法要么需依赖存在计算局限性的集中优化,要么缺乏公平性保障。本文将纳什议价解(Nash Bargaining Solution)集成至多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient)中,即Nash-MADDPG,实现面向激励对齐的V2V能源交易。纳什议价机制确定高效的双边定价,而纳什引导的价格接近度奖励引导智能体学习趋近于议价最优策略。基于30天连续运行的评估表明:相较于双重拍卖,社会总福利提升61.6%,交易量提升62.9%,同时实现更优的公平性——例如Jain指数提升40.1%。在30天时间内针对6至100个智能体(含持续车辆周转)的测试,证实了该方法在人囗规模上的可扩展性及经验定价的稳定性——其价格接近于纳什议价基准。