Numerical sequence representation of DNA sequences and methods to distinguish coding and non-coding sequences in a complete genome

  • Zu Guo Yu*
  • , Vo Anh
  • , Yu Zhou
  • , Li Qian Zhou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this presentation we introduce two methods to distinguish coding and non-coding sequences in a complete genome. A numerical sequence representation of DNA sequences is introduced first. There exists a one-to-one correspondence between a DNA sequence and its numerical sequence representation. In the first method, three exponents from a multifractal analysis are selected to construct the parameter space. In the second method, which is based on a Fourier transform approach, three parameters from the power spectrum of the numerical sequence representation are selected to construct the parameter space. Each DNA may be represented by a point in these three-dimensional spaces. We found that the points corresponding to coding and non-coding sequences in the complete genomes of prokaryotes are divided into different regions in both parameter spaces. If the point for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is recognized as a coding sequence; otherwise, the sequence is classified as a non-coding one. The average accuracies using Fisher's discriminant algorithm for coding and non-coding sequences are satisfactory.

Original languageEnglish
Title of host publicationWMSCI 2007 - The 11th World Multi-Conference on Systemics, Cybernetics and Informatics, Jointly with the 13th International Conference on Information Systems Analysis and Synthesis, ISAS 2007 - Proc.
Pages171-176
Number of pages6
StatePublished - 2007
Externally publishedYes
Event11th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2007, Jointly with the 13th International Conference on Information Systems Analysis and Synthesis, ISAS 2007 - Orlando, FL, United States
Duration: 8 Jul 200711 Jul 2007

Publication series

NameWMSCI 2007 - The 11th World Multi-Conference on Systemics, Cybernetics and Informatics, Jointly with the 13th International Conference on Information Systems Analysis and Synthesis, ISAS 2007 - Proc.
Volume1

Conference

Conference11th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2007, Jointly with the 13th International Conference on Information Systems Analysis and Synthesis, ISAS 2007
Country/TerritoryUnited States
CityOrlando, FL
Period8/07/0711/07/07

Keywords

  • Coding/non-coding sequences
  • Complete genome
  • Fourier transform
  • Fractal analysis

Fingerprint

Dive into the research topics of 'Numerical sequence representation of DNA sequences and methods to distinguish coding and non-coding sequences in a complete genome'. Together they form a unique fingerprint.

Cite this