TY - JOUR
T1 - Heterogeneous FPGA-based cost-optimal design for timing-constrained CNNs
AU - Jiang, Weiwen
AU - Sha, Edwin Hsing Mean
AU - Zhuge, Qingfeng
AU - Yang, Lei
AU - Chen, Xianzhang
AU - Hu, Jingtong
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11
Y1 - 2018/11
N2 - Field programmable gate array (FPGA) has been one of the most popular platforms to implement convolutional neural networks (CNNs) due to its high performance and cost efficiency; however, limited by the on-chip resources, the existing single-FPGA architectures cannot fully exploit the parallelism in CNNs. In this paper, we explore heterogeneous FPGA-based designs to effectively leverage both task and data parallelism, such that the resultant system can achieve the minimum cost while satisfying timing constraints. In order to maximize the task parallelism, we investigate two critical problems: 1) buffer placement, where to place buffers to partition CNNs into pipeline stages and 2) task assignment, what type of FPGA to implement different CNN layers. We first formulate the system-level optimization problem with a mixed integer linear programming model. Then, we propose an efficient dynamic programming algorithm to obtain the optimal solutions. On top of that, we devise an efficient algorithm that exploits data parallelism within CNN layers to further improve cost efficiency. Evaluations on well-known CNNs demonstrate that the proposed techniques can obtain an average of 30.82% reduction in system cost under the same timing constraint, and an average of 1.5 times speedup in performance under the same cost budget, compared with the state-of-the-art techniques.
AB - Field programmable gate array (FPGA) has been one of the most popular platforms to implement convolutional neural networks (CNNs) due to its high performance and cost efficiency; however, limited by the on-chip resources, the existing single-FPGA architectures cannot fully exploit the parallelism in CNNs. In this paper, we explore heterogeneous FPGA-based designs to effectively leverage both task and data parallelism, such that the resultant system can achieve the minimum cost while satisfying timing constraints. In order to maximize the task parallelism, we investigate two critical problems: 1) buffer placement, where to place buffers to partition CNNs into pipeline stages and 2) task assignment, what type of FPGA to implement different CNN layers. We first formulate the system-level optimization problem with a mixed integer linear programming model. Then, we propose an efficient dynamic programming algorithm to obtain the optimal solutions. On top of that, we devise an efficient algorithm that exploits data parallelism within CNN layers to further improve cost efficiency. Evaluations on well-known CNNs demonstrate that the proposed techniques can obtain an average of 30.82% reduction in system cost under the same timing constraint, and an average of 1.5 times speedup in performance under the same cost budget, compared with the state-of-the-art techniques.
KW - Convolutional neural networks (CNNs)
KW - heterogeneous field programmable gate array (FPGA) cluster
KW - optimization algorithms
KW - partitioning and mapping
UR - https://www.scopus.com/pages/publications/85050987630
U2 - 10.1109/TCAD.2018.2857098
DO - 10.1109/TCAD.2018.2857098
M3 - 文章
AN - SCOPUS:85050987630
SN - 0278-0070
VL - 37
SP - 2542
EP - 2554
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
M1 - 8412611
ER -