TY - JOUR
T1 - Distribution-free data density estimation in large-scale networks
AU - Zhou, Minqi
AU - Zhang, Rong
AU - Qian, Weining
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2018, Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Estimating the global data distribution in large-scale networks is an important issue and yet to be well addressed. It can benefit many applications, especially in the cloud computing era, such as load balancing analysis, query processing, and data mining. Inspired by the inversion method for random variate (number) generation, in this paper, we present a novel model called distribution-free data density estimation for large ring-based networks to achieve high estimation accuracy with low estimation cost regardless of the distribution models of the underlying data. This model generates random samples for any arbitrary distribution by sampling the global cumulative distribution function and is free from sampling bias. Armed with this estimation method, we can estimate data densities over both one-dimensional and multidimensional tuple sets, where each dimension could be either continuous or discrete as its domain. In large-scale networks, the key idea for distribution-free estimation is to sample a small subset of peers for estimating the global data distribution over the data domain. Algorithms on computing and sampling the global cumulative distribution function based on which the global data distribution is estimated are introduced with a detailed theoretical analysis. Our extensive performance study confirms the effectiveness and efficiency of our methods in large ring-based networks.
AB - Estimating the global data distribution in large-scale networks is an important issue and yet to be well addressed. It can benefit many applications, especially in the cloud computing era, such as load balancing analysis, query processing, and data mining. Inspired by the inversion method for random variate (number) generation, in this paper, we present a novel model called distribution-free data density estimation for large ring-based networks to achieve high estimation accuracy with low estimation cost regardless of the distribution models of the underlying data. This model generates random samples for any arbitrary distribution by sampling the global cumulative distribution function and is free from sampling bias. Armed with this estimation method, we can estimate data densities over both one-dimensional and multidimensional tuple sets, where each dimension could be either continuous or discrete as its domain. In large-scale networks, the key idea for distribution-free estimation is to sample a small subset of peers for estimating the global data distribution over the data domain. Algorithms on computing and sampling the global cumulative distribution function based on which the global data distribution is estimated are introduced with a detailed theoretical analysis. Our extensive performance study confirms the effectiveness and efficiency of our methods in large ring-based networks.
KW - data density estimation
KW - distribution-free
KW - random sampling
UR - https://www.scopus.com/pages/publications/85037377420
U2 - 10.1007/s11704-016-6194-y
DO - 10.1007/s11704-016-6194-y
M3 - 文章
AN - SCOPUS:85037377420
SN - 2095-2228
VL - 12
SP - 1220
EP - 1240
JO - Frontiers of Computer Science
JF - Frontiers of Computer Science
IS - 6
ER -