TY - JOUR
T1 - Uses of selection strategies in both spectral and sample spaces for classifying hard and soft blueberry using near infrared data
AU - Hu, Menghan
AU - Zhai, Guangtao
AU - Zhao, Yu
AU - Wang, Zhaodi
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - In the current work, we attempt to leverage the fewer wavelengths and samples to develop a classification model for classifying hard and soft blueberries using near infrared (NIR) data. To do this, random frog selection and active learning approaches are used in the spectral space and the sample queue, respectively. To reduce the spectral number, a random frog spectral selection approach was applied to collect wavelengths informative with hardness. Prediction model based on 22 selected spectra gave slightly better results than that based on the full spectra. In terms of the selection operation in the sample space, the query by committee was validated to be suitable for blueberry hardness classification with the accuracy, precision and recall of 78%, 74% and 98% when taking only 25 sample queries. Its standard deviation curves of performance metrics are also located in regions of low values (around 0.05) and fluctuated steadily in shape, winning over those of the other 4 active learning strategies and random method. In summary, the respective uses of random frog and query by committee in the NIR spectral vector and the sample queue showed the considerable potential for establishing a simple but robust classifier for hard and soft blueberries with very low labeling cost.
AB - In the current work, we attempt to leverage the fewer wavelengths and samples to develop a classification model for classifying hard and soft blueberries using near infrared (NIR) data. To do this, random frog selection and active learning approaches are used in the spectral space and the sample queue, respectively. To reduce the spectral number, a random frog spectral selection approach was applied to collect wavelengths informative with hardness. Prediction model based on 22 selected spectra gave slightly better results than that based on the full spectra. In terms of the selection operation in the sample space, the query by committee was validated to be suitable for blueberry hardness classification with the accuracy, precision and recall of 78%, 74% and 98% when taking only 25 sample queries. Its standard deviation curves of performance metrics are also located in regions of low values (around 0.05) and fluctuated steadily in shape, winning over those of the other 4 active learning strategies and random method. In summary, the respective uses of random frog and query by committee in the NIR spectral vector and the sample queue showed the considerable potential for establishing a simple but robust classifier for hard and soft blueberries with very low labeling cost.
UR - https://www.scopus.com/pages/publications/85046289584
U2 - 10.1038/s41598-018-25055-x
DO - 10.1038/s41598-018-25055-x
M3 - 文章
C2 - 29703949
AN - SCOPUS:85046289584
SN - 2045-2322
VL - 8
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 6671
ER -