TY - GEN
T1 - An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar
AU - Ni, Leibin
AU - Wang, Yuhao
AU - Yu, Hao
AU - Yang, Wei
AU - Weng, Chuliang
AU - Zhao, Junfeng
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/3/7
Y1 - 2016/3/7
N2 - Emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD-conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.
AB - Emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD-conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.
UR - https://www.scopus.com/pages/publications/84996847834
U2 - 10.1109/ASPDAC.2016.7428024
DO - 10.1109/ASPDAC.2016.7428024
M3 - 会议稿件
AN - SCOPUS:84996847834
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 280
EP - 285
BT - 2016 21st Asia and South Pacific Design Automation Conference, ASP-DAC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st Asia and South Pacific Design Automation Conference, ASP-DAC 2016
Y2 - 25 January 2016 through 28 January 2016
ER -