Parallel accessing massive NetCDF data based on MapReduce

  • Hui Zhao*
  • , Siyun Ai
  • , Zhenhua Lv
  • , Bo Li
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations

Abstract

As a Network Common Data Format, NetCDF has been widely used in terrestrial, marine and atmospheric sciences. A new paralleling storage and access method for large scale NetCDF scientific data is implemented based on Hadoop. The retrieval method is implemented based on MapReduce. The Argo data is used to demonstrate our method. The performance is compared under a distributed environment based on PCs by using different data scale and different task numbers. The experiments result show that the parallel method can be used to store and access the large scale NetCDF efficiently.

Original languageEnglish
Title of host publicationWeb Information Systems and Mining - International Conference, WISM 2010, Proceedings
Pages425-431
Number of pages7
EditionM4D
DOIs
StatePublished - 2010
Event2010 International Conference on Web Information Systems and Mining, WISM 2010 - Sanya, China
Duration: 23 Oct 201024 Oct 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberM4D
Volume6318 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2010 International Conference on Web Information Systems and Mining, WISM 2010
Country/TerritoryChina
CitySanya
Period23/10/1024/10/10

Keywords

  • Data intensive
  • MapReduce
  • NetCDF
  • Parallel access

Fingerprint

Dive into the research topics of 'Parallel accessing massive NetCDF data based on MapReduce'. Together they form a unique fingerprint.

Cite this