Hierarchical Deep Reinforcement Learning for Multi-Objective Integrated Circuit Physical Layout Optimization with Congestion-Aware Reward Shaping
Author(s) -
Haijian Zhang,
Yao Ge,
Xiuyuan Zhao,
Jiyuan Wang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3610615
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Physical layout optimization in integrated circuit (IC) design remains a critical challenge as semiconductor technology scales toward advanced nodes, where traditional electronic design automation (EDA) tools struggle to simultaneously optimize multiple conflicting objectives including routing congestion, power consumption, and timing performance. This paper presents a novel hierarchical deep reinforcement learning (HDRL) framework for multi-objective IC layout optimization that addresses the exponential complexity of modern chip design spaces. Our approach introduces a congestion-aware reward shaping mechanism that dynamically balances exploration and exploitation while incorporating domain-specific knowledge through a hierarchical policy decomposition. The framework employs a dual-level architecture where a high-level policy manages global placement strategies and a low-level policy handles detailed placement refinements, enabling efficient optimization across different granularities. We integrate adaptive action space pruning based on physical design rules and propose a novel state representation that captures both local congestion patterns and global routing accessibility. Extensive experiments on OpenROAD benchmarks demonstrate that our method achieves an average 23.7% reduction in routing congestion, 18.4% improvement in power efficiency, and 15.2% enhancement in timing closure compared to state-of-the-art commercial tools and recent learning-based approaches. The framework maintains computational efficiency with 40% faster convergence than baseline deep reinforcement learning methods while providing interpretable placement decisions through attention mechanism visualization. Our contributions include the hierarchical policy architecture, congestion-aware reward formulation, and comprehensive evaluation framework that advances the integration of artificial intelligence in physical design automation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom