DNA sequence comparison by a novel probabilistic method

Chenglong Yu, Mo Deng, Stephen S.T. Yau

Research output: Contribution to journalArticlepeer-review

41 Citations (Scopus)

Abstract

This paper proposes a novel method for comparing DNA sequences. By using a graphical representation, we are able to construct the probability distributions of DNA sequences. These probability distributions can then be used to make similarity studies by using the symmetrised Kullback-Leibler divergence. After presenting our method, we test it using six DNA sequences taken from the threonine operons of Escherichia coli K-12 and Shigella flexneri. Our approach is then used to study the evolution of primates using mitochondrial DNA data. Our method allows us to reconstruct a phylogenetic tree for primate evolution. In addition, we use our technique to analyze the classification and phylogeny of the Tomato Yellow Leaf Curl Virus (TYLCV) based on its whole genome sequences. These examples show that large volumes of DNA sequences can be handled more easily and more quickly by our approach than by the existing multiple alignment methods. Moreover, our method, unlike other approaches, does not require human intervention, because it can be applied automatically.

Original languageEnglish
Pages (from-to)1484-1492
Number of pages9
JournalInformation Sciences
Volume181
Issue number8
DOIs
Publication statusPublished or Issued - 15 Apr 2011

Keywords

  • DNA
  • Graphical representation
  • Kullback-Leibler divergence
  • Probability distribution
  • Sequence comparison

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this