Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.
翻译:芯片布局是物理设计中的关键步骤。尽管基于强化学习的方法近年来逐渐兴起,但其训练主要聚焦于线长优化,因此往往无法达到专家级质量的布局效果。我们指出奖励设计是导致与专家性能差距的主要原因,并绕过复杂的流程形式化,转而通过直接学习专家布局来推导奖励模型。我们的方法从最终专家布局出发,逐步推导专家轨迹。利用这些轨迹作为示范或偏好,我们训练一个模型来捕捉专家结果中隐式存在的潜在奖励。实验表明,我们的框架能够仅从单一设计高效学习,并有效泛化至未见过的案例。