TY - JOUR
T1 - Efficient Mining Multi-Mers in a Variety of Biological Sequences
AU - Zhang, Jingsong
AU - Guo, Jianmei
AU - Zhang, Ming
AU - Yu, Xiangtian
AU - Yu, Xiaoqing
AU - Guo, Weifeng
AU - Zeng, Tao
AU - Chen, Luonan
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2020/5/1
Y1 - 2020/5/1
N2 - Counting the occurrence frequency of each kk-mer in a biological sequence is a preliminary yet important step in many bioinformatics applications. However, most kk-mer counting algorithms rely on a given kk to produce single-length kk-mers, which is inefficient for sequence analysis for different kk. Moreover, existing kk-mer counters focus more on DNA and RNA sequences and less on protein ones. In practice, the analysis of kk-mers in protein sequences can provide substantial biological insights in structure, function, and evolution. To this end, an efficient algorithm, called MulMer (Multiple-Mer mining), is proposed to mine kk-mers of various lengths termed multi-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index methods. Moreover, to the best of our knowledge, MulMer is the first able to mine multi-mers in a variety of sequences, including DNA, RNA, and protein sequences.
AB - Counting the occurrence frequency of each kk-mer in a biological sequence is a preliminary yet important step in many bioinformatics applications. However, most kk-mer counting algorithms rely on a given kk to produce single-length kk-mers, which is inefficient for sequence analysis for different kk. Moreover, existing kk-mer counters focus more on DNA and RNA sequences and less on protein ones. In practice, the analysis of kk-mers in protein sequences can provide substantial biological insights in structure, function, and evolution. To this end, an efficient algorithm, called MulMer (Multiple-Mer mining), is proposed to mine kk-mers of various lengths termed multi-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index methods. Moreover, to the best of our knowledge, MulMer is the first able to mine multi-mers in a variety of sequences, including DNA, RNA, and protein sequences.
KW - Sequential pattern mining
KW - and biological sequence analysis
KW - inverted index
KW - κ-mer counting
KW - κ-mers of various lengths
UR - https://www.scopus.com/pages/publications/85045746718
U2 - 10.1109/TCBB.2018.2828313
DO - 10.1109/TCBB.2018.2828313
M3 - 文章
C2 - 29993642
AN - SCOPUS:85045746718
SN - 1545-5963
VL - 17
SP - 949
EP - 958
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
M1 - 8341507
ER -