Abstract
In this work, we performed an initial design space exploration of an accelerated processing unit (APU)—a hybrid CPU+GPU architecture that integrates both compute units (CUs) and memory into a unified system. This integration aims to reduce data movement, enhance memory locality, and improve energy efficiency by enabling the CPU and GPU to share memory directly. This effort focused on the interplay of key design components—cache line size, the number of CUs, and main memory technology—and the trade-offs of each configuration were analyzed. This paper highlights the various configurations’ impact on memory accesses, data reuse, and power utilization. The results provide valuable insights that can be leveraged to optimize APU architectures for high-performance and energy-efficient computing and thus create a balanced architecture. This optimization can be achieved by adopting dynamic cache management, runtime CU scaling, and advanced memory integration, highlighting the potential of APUs to address critical challenges in compute, data movement, and memory power consumption.