TY - GEN
T1 - Distinguish coding and noncoding sequences in a complete genome using fourier transform
AU - Zhou, Yu
AU - Zhou, Li Qian
AU - Yu, Zu Guo
AU - Anh, Vo
PY - 2007
Y1 - 2007
N2 - A Fourier transform method is proposed to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation of the DNA sequence proposed in our previous paper (Zhou et al., J. Theor. Biol. 2005) and the imperfect periodicity of 3 in protein coding sequences. The three parameters Px(s̄) (1), Px(s̄) (1/3) and P x(s̄) (1/36) in the Fourier transform of the number sequence representation of DNA sequences are selected to form a three-dimensional parameter space. Each DNA sequence is then represented by a point in this space. The points corresponding to coding and non-coding sequences in the complete genome of prokaryotes are seen to be divided into different regions. If the point (Px(s̄) (1), Px(s̄) (1/3), P x(s̄) (1/36)) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is distinguished as a coding sequence; otherwise, the sequence is classified as a noncoding one. Fisher's discriminant algorithm is used to study the discriminant accuracy. The average discriminant accuracies pc, pnc, qc and q nc of all 51 prokaryotes obtained by the present method reach 81.02%, 92.27%, 80.77% and9 2.24% respectively.
AB - A Fourier transform method is proposed to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation of the DNA sequence proposed in our previous paper (Zhou et al., J. Theor. Biol. 2005) and the imperfect periodicity of 3 in protein coding sequences. The three parameters Px(s̄) (1), Px(s̄) (1/3) and P x(s̄) (1/36) in the Fourier transform of the number sequence representation of DNA sequences are selected to form a three-dimensional parameter space. Each DNA sequence is then represented by a point in this space. The points corresponding to coding and non-coding sequences in the complete genome of prokaryotes are seen to be divided into different regions. If the point (Px(s̄) (1), Px(s̄) (1/3), P x(s̄) (1/36)) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is distinguished as a coding sequence; otherwise, the sequence is classified as a noncoding one. Fisher's discriminant algorithm is used to study the discriminant accuracy. The average discriminant accuracies pc, pnc, qc and q nc of all 51 prokaryotes obtained by the present method reach 81.02%, 92.27%, 80.77% and9 2.24% respectively.
UR - https://www.scopus.com/pages/publications/38049091214
U2 - 10.1109/ICNC.2007.333
DO - 10.1109/ICNC.2007.333
M3 - 会议稿件
AN - SCOPUS:38049091214
SN - 0769528759
SN - 9780769528755
T3 - Proceedings - Third International Conference on Natural Computation, ICNC 2007
SP - 295
EP - 299
BT - Proceedings - Third International Conference on Natural Computation, ICNC 2007
T2 - 3rd International Conference on Natural Computation, ICNC 2007
Y2 - 24 August 2007 through 27 August 2007
ER -