TY - GEN
T1 - An SVM-based approach to discover microRNA precursors in plant genomes
AU - Wang, Yi
AU - Jin, Cheqing
AU - Zhou, Minqi
AU - Zhou, Aoying
PY - 2012
Y1 - 2012
N2 - MicroRNAs (miRNAs) are noncoding RNAs of ∼22 nucleotides that play versatile regulatory roles in multicelluler organisms. Since the cloning methods for miRNAs identification are biased towards abundant miRNAs, the computational approaches provide useful complements to identify miRNAs which are highly constrained by tissue- and time-specifically expression manners. In this paper, we propose a novel Support Vector Machine (SVM) based detector, named MiR-PD, to identify pre-miRNAs in plants. The classifier is constructed based on twelve features of pre-miRNAs, inclusive of five global features and seven sub-structure features. Trained on 790 plant pre-miRNAs and 7,900 pseudo pre-miRNAs, MiR-PD achieves 96.43% five-fold cross-validation accuracy. Tested on the newly identified 441 plant pre-miRNAs and 62,883 pseudo pre-miRNAs, MiR-PD reports an accuracy of 99.71% with 77.55% sensitivity and 99.87% specificity, suggesting a feasible genome-wide application of this miRNAs detector so as to identify novel miRNAs (especially for those species-specific miRNAs) in plants without relying on phylogenetical conservation.
AB - MicroRNAs (miRNAs) are noncoding RNAs of ∼22 nucleotides that play versatile regulatory roles in multicelluler organisms. Since the cloning methods for miRNAs identification are biased towards abundant miRNAs, the computational approaches provide useful complements to identify miRNAs which are highly constrained by tissue- and time-specifically expression manners. In this paper, we propose a novel Support Vector Machine (SVM) based detector, named MiR-PD, to identify pre-miRNAs in plants. The classifier is constructed based on twelve features of pre-miRNAs, inclusive of five global features and seven sub-structure features. Trained on 790 plant pre-miRNAs and 7,900 pseudo pre-miRNAs, MiR-PD achieves 96.43% five-fold cross-validation accuracy. Tested on the newly identified 441 plant pre-miRNAs and 62,883 pseudo pre-miRNAs, MiR-PD reports an accuracy of 99.71% with 77.55% sensitivity and 99.87% specificity, suggesting a feasible genome-wide application of this miRNAs detector so as to identify novel miRNAs (especially for those species-specific miRNAs) in plants without relying on phylogenetical conservation.
KW - MiR-PD
KW - MicroRNAs
KW - plant
KW - support vector machine
UR - https://www.scopus.com/pages/publications/84863285874
U2 - 10.1007/978-3-642-28320-8_26
DO - 10.1007/978-3-642-28320-8_26
M3 - 会议稿件
AN - SCOPUS:84863285874
SN - 9783642283192
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 304
EP - 315
BT - New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers
T2 - 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011
Y2 - 24 May 2011 through 27 May 2011
ER -