TY - GEN
T1 - Storage and recreation trade-off for multi-version data management
AU - Zhang, Yin
AU - Liu, Huiping
AU - Jin, Cheqing
AU - Guo, Ye
N1 - Publisher Copyright:
© 2018, Springer International Publishing AG, part of Springer Nature.
PY - 2018
Y1 - 2018
N2 - With the tremendous development of data acquisition technology, massive observation data have been accumulated in scientific disciplines. As the difference between the successive observations only changes slightly, it is critical to utilize multi-version data management technology to compress data to minimize both storage and recreation. However, the existing work on this field only optimizes the total storage and recreation costs, but ignores the recreation cost of some special versions. Consequently, in this paper, we investigate the trade-off among all of three metrics, including total storage cost, total recreation cost, and the maximum recreation cost for each version. We formulate two problems, including (1) discover a storage plan to lower the total recreation and the individual recreation if the total storage is limited; (2) find a storage plan to minimize the total storage with restricted total recreation and individual recreation. To solve above problems, we model all versions with a directed graph and then devise two efficient algorithms based on spanning tree. A series of experiments indicate that our proposals are effective and efficient in dealing with the problems.
AB - With the tremendous development of data acquisition technology, massive observation data have been accumulated in scientific disciplines. As the difference between the successive observations only changes slightly, it is critical to utilize multi-version data management technology to compress data to minimize both storage and recreation. However, the existing work on this field only optimizes the total storage and recreation costs, but ignores the recreation cost of some special versions. Consequently, in this paper, we investigate the trade-off among all of three metrics, including total storage cost, total recreation cost, and the maximum recreation cost for each version. We formulate two problems, including (1) discover a storage plan to lower the total recreation and the individual recreation if the total storage is limited; (2) find a storage plan to minimize the total storage with restricted total recreation and individual recreation. To solve above problems, we model all versions with a directed graph and then devise two efficient algorithms based on spanning tree. A series of experiments indicate that our proposals are effective and efficient in dealing with the problems.
KW - Multi-version data management
KW - Scientific data management
KW - Storage and recreation trade-off
UR - https://www.scopus.com/pages/publications/85051131748
U2 - 10.1007/978-3-319-96893-3_30
DO - 10.1007/978-3-319-96893-3_30
M3 - 会议稿件
AN - SCOPUS:85051131748
SN - 9783319968926
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 394
EP - 409
BT - Web and Big Data - Second International Joint Conference, APWeb-WAIM 2018, Proceedings
A2 - Cai, Yi
A2 - Ishikawa, Yoshiharu
A2 - Xu, Jianliang
PB - Springer Verlag
T2 - 2nd Asia Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2018
Y2 - 23 July 2018 through 25 July 2018
ER -