TY - GEN
T1 - Efficient task allocation method to improve network processor throughput
AU - Yu, Yong
AU - Yu, Zhihang
AU - Tang, Feilong
AU - Guo, Minyi
PY - 2009
Y1 - 2009
N2 - Ubiquitous computing involves large number of devices which are connected via networks. This requires packet processing service to guarantee privacy, security, and high quality. We study to provide ubiquitous computing with stable and satisfied services through improving packet processing performance. Since the applications become more and more complicated, the task allocation among multi-cores for pipelined architecture becomes important and difficult. In order to map tasks onto pipelined architecture and maximize the overall throughput, we propose a task allocation scheme incorporated with profiling and globally thread refinement. This scheme relies on a performance model which determines the system throughput considering multi-thread, memory access and the effect of communications between stages. We evaluate the technique by implementing representative network processing applications on the Intel IXP architecture. Experimental results show that our scheme is able to generate mapping of realistic applications to balance the stages and obtain high throughput. Furthermore, it outperforms other methods even when the PE number is reduced.
AB - Ubiquitous computing involves large number of devices which are connected via networks. This requires packet processing service to guarantee privacy, security, and high quality. We study to provide ubiquitous computing with stable and satisfied services through improving packet processing performance. Since the applications become more and more complicated, the task allocation among multi-cores for pipelined architecture becomes important and difficult. In order to map tasks onto pipelined architecture and maximize the overall throughput, we propose a task allocation scheme incorporated with profiling and globally thread refinement. This scheme relies on a performance model which determines the system throughput considering multi-thread, memory access and the effect of communications between stages. We evaluate the technique by implementing representative network processing applications on the Intel IXP architecture. Experimental results show that our scheme is able to generate mapping of realistic applications to balance the stages and obtain high throughput. Furthermore, it outperforms other methods even when the PE number is reduced.
UR - https://www.scopus.com/pages/publications/70349733021
U2 - 10.1109/CISIS.2009.63
DO - 10.1109/CISIS.2009.63
M3 - 会议稿件
AN - SCOPUS:70349733021
SN - 9780769535753
T3 - Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009
SP - 601
EP - 606
BT - Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009
T2 - International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009
Y2 - 16 March 2009 through 19 March 2009
ER -