Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.
翻译:芯片布局是物理设计中的关键步骤。尽管基于强化学习的方法近期已涌现,但其训练主要聚焦于线长优化,因此常无法生成专家级质量的布局。我们发现,奖励设计是导致与专家表现差距的主要原因;我们没有制定复杂流程,而是通过直接从专家布局中学习来推导奖励模型,以此规避该问题。我们的方法从最终专家布局出发,逐步推断专家轨迹。利用这些轨迹作为示范或偏好,我们训练一个模型来捕捉专家结果中潜在的隐含奖励。实验表明,我们的框架甚至能从单个设计高效学习,并良好泛化至未见案例。