Personalized pricing negotiations are a challenging testbed for LLM agents because successful interaction does not guarantee profitable decision making. A seller may produce valid actions and close many deals while still pricing poorly when buyer willingness to pay and bargaining traits remain hidden. This paper presents PrefBench, a simulator-based benchmark for hidden-preference personalized pricing negotiations. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle information, and negotiation history, while latent buyer variables govern valuation, patience, counter-offer behavior, and walkaway decisions. PrefBench evaluates this setting through an LLM-facing state-summary protocol that constrains agents to return strict JSON actions under a fixed hidden-information boundary. We evaluate zero-shot LLM sellers against heuristic references over 7,500 episodes. The tested LLMs follow the protocol reliably and achieve deal rates above 0.99, but their seller-profit outcomes remain weak: the best LLM average profit is only slightly above the random baseline and far below a simple concession heuristic under the same episode stream. These results show that structured action compliance and agreement-seeking behavior can coexist with weak profit-sensitive bargaining. PrefBench provides a controlled benchmark for evaluating pricing-agent behavior under hidden buyer preferences.
翻译:个性化定价谈判是LLM智能体的一项具有挑战性的测试基准,因为成功的交互并不能保证盈利决策。当买方的支付意愿和谈判特征仍处于隐藏状态时,卖方可能产生有效行动并达成许多交易,同时定价仍然不佳。本文提出了PrefBench,一个基于模拟器的隐藏偏好个性化定价谈判基准测试。每个交互回合将一个模拟买方与一个固定的车辆定制套餐配对;卖方观察公共人物描述符、套餐信息和谈判历史,而潜在买方变量则控制估值、耐心、还价行为和退出决策。PrefBench通过一个面向LLM的状态摘要协议来评估这一设置,该协议约束智能体在固定的隐藏信息边界下返回严格的JSON动作。我们在7,500个交互回合中评估了零样本LLM卖方与启发式参考模型的表现。经过测试的LLM能够可靠地遵循协议,并实现超过0.99的成交率,但其卖方利润表现仍然较弱:最佳LLM的平均利润仅略高于随机基线,并在相同的交互回合序列下远低于一个简单的让步启发式策略。这些结果表明,结构化的行动合规性和寻求协议的行为可以与对利润敏感的弱谈判能力共存。PrefBench为评估隐藏买方偏好下的定价智能体行为提供了一个受控的基准测试。