Upper Limit Analysis of Scalable Parallel Computing on the Premise of Reliability Requirement

  • Huanliang Xiong
  • , Guosun Zeng
  • , Wei Wang
  • , Canghai Wu
  • , Yefu Wang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The Top500 supercomputers ranking has been held twice a year according to Linpack performance for more than 20 years, which greatly stimulates the development of high-performance computing. However, it is still not clear how to determine the scale limit of supercomputers. It will undoubtedly cause a waste of resources if we build bigger and bigger supercomputers without caring about other aspects of cost, energy, reliability. Thus, this paper analyses the scalability and scale limit for parallel computing with a reliability requirement. We use a Markov chain to model the state transition process of a parallel computing system, so the probability of parallel tasks running on machines successfully can be evaluated, that is the reliability of parallel computing. When parallel computing carries out an iso-speed efficiency extension under specific reliability requirements, we present an approach to calculate the maximum number of processing nodes and the maximum workload of parallel tasks, which actually reveals the function relation between the scale limit and the speed efficiency of parallel computing. Taking “Tianhe-2”, which is the current No. 1 supercomputer, as an example, we utilize our methods to do a case study and predict its scale limit. Finally, a simulation experiment is conducted to verify our theory.

Original languageEnglish
Pages (from-to)573-583
Number of pages11
JournalIETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)
Volume33
Issue number6
DOIs
StatePublished - 1 Nov 2016
Externally publishedYes

Keywords

  • Markov chain
  • Parallel computing
  • Reliability
  • Scalability
  • Scale limit analysis

Fingerprint

Dive into the research topics of 'Upper Limit Analysis of Scalable Parallel Computing on the Premise of Reliability Requirement'. Together they form a unique fingerprint.

Cite this