A New Heuristic Approach for DNA Sequences Alignment

Full Text (PDF, 814KB), PP.18-23

Views: 0 Downloads: 0

Author(s)

M.I. Khalil 1,*

1. Princess Nora Bint Abdurrahman University, Faculty of Computer and Information Sciences, Networking and Communication Dept., Riyadh

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2015.12.03

Received: 22 May 2015 / Revised: 7 Aug. 2015 / Accepted: 6 Oct. 2015 / Published: 8 Nov. 2015

Index Terms

DNA similarity algorithms, DNA sequence comparison, DNA analysis, pattern recognition, Longest Common Substring, Longest Common Subsequence, DNA sequences alignment

Abstract

The problem of comparing DNA sequences is one of the most significant tasks in the field of computational biology. It helps locating the similarities and differences between pairs of DNA sequences. This task can be achieved by finding the longest common substrings between DNA sequences and consequently aligning them. The complexity of this task is due to the high computational power and huge space consuming. Comparing DNA sequences leads to infer the cause of a certain disease beside many significant biological applications. This paper introduces a new Heuristic Approach for DNA Sequences Alignment between two DNA sequences. The new approach is based on three processing phases: the first phase finds the multiple common substrings in the two sequences, the second one sorts the obtained common substrings descending according to their lengths, and the last phase generates the optimal two aligned sequences. The modules of the new approach have been implemented and tested in C# language under Windows platform. The obtained results manifest a reduction in both time of processing and memory requirements. 

Cite This Paper

M. I. Khalil,"A New Heuristic Approach for DNA Sequences Alignment", IJIGSP, vol.7, no.12, pp.18-23, 2015. DOI: 10.5815/ijigsp.2015.12.03

Reference

[1]Http://www.astrochem.org/sci/Nucleobases.php, retrieved on 8th, june 2015.

[2]Harvey Lodish, Arnold Berk, S Lawrence Zipursky, Paul Matsudaira, David Baltimore, and James Darnell., Molecular Cell Biology, 4th edition, New York: W. H. Freeman; 2000., ISBN-10: 0-7167-3136-3.

[3]S.Rajesh, S.Prathema and.L.S.S.Reddy, "Unusual Pattern Detection in DNA Database Using KMP Algorithm", International Journal of Computer Applications (2010) (0975 - 8887) Vol. 1 – No. 22.

[4]Kyohei Yamaguchi and Satoshi Mizuta, "A New Graphical Representation of DNA Sequences Using Symmetrical Vector Assignment", Review of Bioinformatics and Biometrics (RBB) Volume 3, 2014.

[5]O. Gotoh, "An Improved Algorithm for Matching Biological Sequences," Journal of Molecular Biology, 162, pp: 705~708, 1982.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.

[6]S. Grier, "A tool that detects plagiarism in Pascal programs", ACM SIGCSE Bulletin, vol. 13, no. 1, (1981), pp. 15-20.

[7]J. A. W. Faidhi and S. K. Robinson, "An empirical approach for detecting program similarity within a university programming environment", Computers & Education, vol. 11, no. 1, (1987), pp. 11-19.

[8]U. Manber, "Finding similar files in a large file system[C/OL]", In: Proceedings of the Winter USENIX Conference, (1994), pp. 1-10.

[9]BLAST, http://blast.ncbi.nlm.nih.gov/Blast.cgi, (2011) September.

[10]C. Yu, S.-Y. Cheng, R. L. He and S. S. -T. Yau, "Protein map: An alignment-free sequence comparison method based on various properties of amino acids", Gene, vol. 486, (2011), pp. 110-118.

[11]Y. Guo and T. -m. Wang, "A new method to analyze the similarity of the DNA sequences", Journal of Molecular Structure: THEOCHEM, vol. 853, (2008), pp. 62–67. 

[12]Izzat Alsmadi and Maryam Nuser, "String Matching Evaluation Methods for DNA Comparison", International Journal of Advanced Science and Technology, Vol. 47, October, 2012.

[13]BLAST, http://blast.ncbi.nlm.nih.gov/Blast.cgi, (2011) September.

[14]M.I.Khalil, Finding Longest Common Substrings in DNA Sequences, IJITCS, 2015.

[15]J. K. Me, M. R. Panigrahi, G. N. Dash and P. K. Meher, Wavelet Based Lossless DNA Sequence Compression for Faster Detection of Eukaryotic Protein Coding Regions, IJIGSP Vol.4, No.7, July 2012.

[16]Mohammed Abo-Zahhad, Sabah M. Ahmed and Shimaa A. Abd-Elrahman. A Novel Circular Mapping Technique for Spectral Classification of Exons and Introns in Human DNA Sequences, IJITCS Vol. 6, No. 4, March 2014, PP.19-29.

[17]G. Sethuraman,Kavitha Joseph, Star Coloring Problem: The DNA Solution, IJITCS Vol. 4, No. 3, April 2012, PP.31-37.

[18]M.I.Khalil, M.A.Hadi, Finding Longest Common Substrings in Documents, IJIGSP Vol. 7, No. 9, 2015, 9, 27-33.