TY - GEN
T1 - Software-Hardware Co-Design for Feature Extraction on Racetrack Memory-Based PIM
AU - Pan, Muyun
AU - Li, Xinghao
AU - Gu, Shouzhen
AU - Sha, Edwin Hsing Mean
AU - Zhuge, Qingfeng
N1 - Publisher Copyright:
©2025 IEEE.
PY - 2025
Y1 - 2025
N2 - CNNs, which involve intensive matrix computations, often face memory bandwidth bottlenecks that PIM architectures aim to overcome. RM, with its high density, low power consumption, and excellent endurance, is particularly suited for storing matrix data in such architectures. We propose a software-level data placement strategy, tile blocking, designed to optimize the shift-based access mechanism inherent in RM. We integrate this strategy into a hardware-software co-design framework for CNN feature extraction. Our approach enhances data parallelism and significantly reduces the number of shift operations, thereby achieving energy-efficient computation on RM-based PIM systems. Experimental results demonstrate that our strategy reduced the average number of shifts by approximately 81.64%, reduced the average energy by approximately 38.68%, and reduced the average execution time by approximately 44.7%.
AB - CNNs, which involve intensive matrix computations, often face memory bandwidth bottlenecks that PIM architectures aim to overcome. RM, with its high density, low power consumption, and excellent endurance, is particularly suited for storing matrix data in such architectures. We propose a software-level data placement strategy, tile blocking, designed to optimize the shift-based access mechanism inherent in RM. We integrate this strategy into a hardware-software co-design framework for CNN feature extraction. Our approach enhances data parallelism and significantly reduces the number of shift operations, thereby achieving energy-efficient computation on RM-based PIM systems. Experimental results demonstrate that our strategy reduced the average number of shifts by approximately 81.64%, reduced the average energy by approximately 38.68%, and reduced the average execution time by approximately 44.7%.
KW - Process-In-Memory
KW - Racetrack Memory
KW - Software-Hardware Co-Design
UR - https://www.scopus.com/pages/publications/105037106959
U2 - 10.1109/NVMSA66678.2025.00018
DO - 10.1109/NVMSA66678.2025.00018
M3 - 会议稿件
AN - SCOPUS:105037106959
T3 - Proceedings - 2025 14th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2025
SP - 57
EP - 62
BT - Proceedings - 2025 14th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 14th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2025
Y2 - 20 August 2025 through 22 August 2025
ER -