TY - JOUR
T1 - Emerging technology enabled energy-efficient GPGPUs register file
AU - Xie, Chenhao
AU - Tan, Jingweijia
AU - Chen, Mingsong
AU - Yi, Yang
AU - Peng, Lu
AU - Fu, Xin
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/5/1
Y1 - 2017/5/1
N2 - Modern Graphics Processing Units (GPGPUs) employ the fine-grained multi-threading among thousands of active threads, leading to the sizable register file (RF) with massive energy consumption. In this study, we explore the emerging technology (i.e., Tunnel FET (TFET)) enabled energy-efficient GPGPUs RF. TFET is much more energy-efficient than CMOS at the low voltage operations, but always using TFET at the low voltage (so that low frequency) causes significant performance degradation. In this study, we first design the hybrid CMOS-TFET based register file, and propose the memory-contention-aware TFET register allocation (MEM_RA). MEM_RA allocates TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads, and the CMOS-based registers are still used for threads requiring normal execution speed. We further observe the insufficient TFET register resources for the memory-intensive benchmarks when applying the MEM_RA technique. We then develop the TFET-register-utilization-aware block allocation (TUBA) and TFET-regsiter-request-aware warp scheduling (TRWS) mechanisms to effectively utilize the limited TFET registers and achieve the maximal energy savings. Our experimental results show that the proposed techniques achieve 40% energy (including both dynamic and leakage) reduction in GPGPUs register file with negligible performance overhead.
AB - Modern Graphics Processing Units (GPGPUs) employ the fine-grained multi-threading among thousands of active threads, leading to the sizable register file (RF) with massive energy consumption. In this study, we explore the emerging technology (i.e., Tunnel FET (TFET)) enabled energy-efficient GPGPUs RF. TFET is much more energy-efficient than CMOS at the low voltage operations, but always using TFET at the low voltage (so that low frequency) causes significant performance degradation. In this study, we first design the hybrid CMOS-TFET based register file, and propose the memory-contention-aware TFET register allocation (MEM_RA). MEM_RA allocates TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads, and the CMOS-based registers are still used for threads requiring normal execution speed. We further observe the insufficient TFET register resources for the memory-intensive benchmarks when applying the MEM_RA technique. We then develop the TFET-register-utilization-aware block allocation (TUBA) and TFET-regsiter-request-aware warp scheduling (TRWS) mechanisms to effectively utilize the limited TFET registers and achieve the maximal energy savings. Our experimental results show that the proposed techniques achieve 40% energy (including both dynamic and leakage) reduction in GPGPUs register file with negligible performance overhead.
KW - Energy efficiency
KW - General-purpose computing on graphics processing units (GPGPUs)
KW - Tunneling field effect transistors (TFETs)
UR - https://www.scopus.com/pages/publications/85017475844
U2 - 10.1016/j.micpro.2017.04.002
DO - 10.1016/j.micpro.2017.04.002
M3 - 文章
AN - SCOPUS:85017475844
SN - 0141-9331
VL - 50
SP - 175
EP - 188
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
ER -