Phylogenetic Method for High-Throughput Ortholog Detection

Full Text (PDF, 396KB), PP.51-59

Views: 0 Downloads: 0

Author(s)

Shaifu Gupta 1,* Manpreet Singh 1

1. Guru Nanak Dev Engineering College, Ludhiana, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2015.02.07

Received: 6 Dec. 2014 / Revised: 2 Jan. 2015 / Accepted: 10 Feb. 2015 / Published: 8 Mar. 2015

Index Terms

Orthologs, comparative genomics, Phylogenetics, Species-overlap, Patristic distance

Abstract

Accurate detection of orthologous proteins is a key aspect of comparative genomics. Orthologs in different species can be used to predict the function of uncontrived genes from model organisms as they retain the same biological function through the path of evolution. Orthologs can be inferred using phylogenetic, pair-wise similarity or synteny based methods. The study here describes a computational method for detecting orthologs of a protein. A phylogenetic tree based approach is used for identification of orthologous proteins. A Combination of species overlap algorithm and patristic distances is used for detecting orthologs of a protein from a set of FASTA sequences. Patristic distances have been used to drill the orthology predictions of any protein down to its closest orthologs. The approach gives a considerably good accuracy and has high specificity and precision. The use of Distance threshold allows controlling the stringency level of predictions so that the closeness and proximity between the protein of interest and its orthologs can be adjusted.

Cite This Paper

Shaifu Gupta, Manpreet Singh, "Phylogenetic Method for High-Throughput Ortholog Detection", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.7, no.2, pp.51-59, 2015. DOI:10.5815/ijieeb.2015.02.07

Reference

[1]Zheng XH, Wang Z-Y, Zhong F, Hoover J, Mural R. "Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs", Bioinformatics, vol. 21, no. 6, pp. 703-710, 2005.

[2]Roth ACJ, Gonnet GH, Dessimoz C. "Algorithm of OMA for large-scale orthology inference", BMC Bioinformatics, December 2008.

[3]Fu Z, Chen X, Vacic V., Nan P, Zhong Y, Jiang T. "MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement", Journal of Computational Biology, vol. 14, no. 9, pp-1160-1175, 2007.

[4]Zmasek CM, Eddy SR. "RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs", BMC Bioinformatics, May 2002.

[5]Kim S. "Clustering Methods for Finding Orthologs among Multiple Species," Ph.D. dissertation, Dept. Comput. Sci., Chungbuk National uni., R.O.Korea, 2007.

[6]Whiteside MD, "Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance", Dept. of Molecular Biology and Biochemistry, Simon Fraser Uni., 2013.

[7]Jenson RA, "Orthologs and paralogs - we need to get it right", Genome Biology, vol. 2 no. 8, pp. 1002-1004, 2001.

[8]Chiu JC, Lee EK, Egan MG, Sarkar IN, Coruzzi GM, Desalle R. "OrthologID: automation of genome-scale ortholog identification within a parsimony framework", Bioinformatics, vol. 22, no. 6, pp. 699-707, 2006.

[9]Dufayard J-F, Duret L, Penel S, Gouy M, Rechenmann F, Perriere G. "Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases", Bioinformatics, vol. 21, no. 11, pp. 2596-2603, 2005.

[10]Li H, Coghlan A, Ruan J, Coin, LJ, Heriche J-K, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK-S, Zheng W, Dehal P, Wang J , Durbin, R. "TreeFam: a curated database of phylogenetic trees of animal gene families", Nucleic Acids Research, vol. 34, pp. D572-D580, 2006.

[11]Van der Heijden, RT, Snel, B, van Noort, V and Huynen, MA, "Orthology prediction at scalable resolution by phylogenetic tree analysis", BMC Bioinformatics, vol. 8, no. 83, 2007.

[12]Huerta-Cepas, J, Dopazo, H, Dopazo, J and Gabaldon, T, "The human phylome", Genome Biology, vol. 8, no. 6, pp. R109.1-R109.16, 2007.

[13]Pryszcz, L.P., Huerta-Cepas, J. and Gabaldon, T., "MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score", Nucleic Acids Research, pp. 1-8, 2010.

[14]Dehal PS, Boore JL. "A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database", BMC Bioinformatics, vol. 7, no. 201, 2006.

[15]Storm CEV, Sonhammer ELL. "Automated ortholog inference from phylogenetic trees and calculation of orthology reliability", Bioinformatics, vol. 8, no. 1, pp. 92-99, 2002.

[16]Yuan YP, Eulenstein O, Vingron M, Bork P. "Towards detection of orthologoues in sequence databases", Bioinformatics, vol. 14, no. 3, pp. 285-289, 1998.

[17]Duret L, Mouchiroud D, Gouy M "HOVERGEN: a database of homologous vertebrate genes", Nucleic Acids Research, vol. 22, no. 12, pp. 2360-2365, 1994.

[18]Altenhoff, A.M. and Dessimoz, C., "Evolutionary Genomics: Statistical and Computational Methods, volume 1", Springer Protocols, pp. 259-279, 2012.

[19]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. "Basic local alignment search tool", Journal of Molecular Biology, vol. 215, pp. 403-410, 1990.

[20]Armchair Biology, On Reciprocal Best Blast Hits, [Online]. Available: http://armchairbiology.blogspot.com/2012/07/on-reciprocal-best-blast-hits.html [Accessed: 20 Aug. 2013]. 

[21]Wall DP, Fraser HB, Hirsh AE, "Detecting putative orthologs", Bioinformatics, vol. 19, no. 13, pp. 1710-1711, 2003.

[22]DeLuca TF, Wu I-H, Pu J, Monaghan T, Peshkin L, Singh S, Wall DP. "Roundup: a multi-genome repository of orthologs and evolutionary distances", Bioinformatics, vol. 22, no. 16, pp. 2044-2046, 2006.

[23]Remm M, Storm CEV, Sonnhammer ELL. "Automatic clustering of orthologs and inparalogs from pairwise species comparisons", J Mol Biol, vol. 314, no. 5, pp. 1041–1052, 2001.

[24]Alexeyenko A, Tamas I, Liu G, Sonhammer ELL. "Automatic clustering of orthologs and inparalogs shared by multiple proteomes", Bioinformatics, vol. 22, no. 14, pp. e9-e15, 2006.

[25]Li L, Stoeckert CJJ and Roos DS, "OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes", Genome Res, vol. 13, pp. 2178-2189, 2003.

[26]Kim K, Kim W, Kim S. "Remark: an automatic program for clustering orthologs flexibly combining a recursive and a Markov clustering algorithms", Bioinformatics, 2011.

[27]Fulton DL, Li Y, Laird M, Horsman B, Roche F, Brinkman F. "Improving the specificity of high-throughput ortholog prediction", BMC Bioinformatics, vol. 7, no. 270, 2006.

[28]Whiteside MD, Winsor GL, Laird MR, Brinkman FSL, "OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis", Nucleic Acids Research, vol. 41, Database issue, pp. D366-D376, 2013.

[29]Yu C, Zavaljevski N, Desai V, Reifman J. "QuartetS: a fast and accurate algorithm for large-scale orthology detection", Nucleic Acids Research, vol. 39, no. 13, May, 2011.

[30]Jun J, Mandoiu I, Nelson CE. "Identification of mammalian orthologs using local synteny", BMC Genomics, vol. 10, no. 630, 2009.

[31]Bandopadhyay S, Sharan R, Ideker T. "Systematic identification of functional orthologs based on protein network comparison", Genome Research, vol. 16, pp. 428-435, 2006.

[32]Chen T-w, Wu TH, Ng WP, Lin W-c. "DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection", BMC Bioinformatics, 2010.

[33]Datta, R.S., Meacham, C., Samad, B., Neyer, C. and Sjolander, K., "Berkeley PHOG: PhyloFacts orthology group prediction web server", Nucleic Acids Research, vol. 37, W84-W89, 2009.

[34]Afrasiabi, C., Samad, B., Dineen, D., Meacham, C. and Sjolander, K., "The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification", Nucleic Acids Research, vol. 41, pp. W242-W248, 2013.

[35]UniProt (2014), "ID Mapping", <http://www.uniprot.org/?tab=mapping> Accessed: 20 Mar, 2014.

[36]Fourment, M. and Gibbs, M.J., "PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change", BMC Evolutionary Biology, vol. 6, no.1, 2006.

[37]Wright, M.W., Eyre, T.A., Lush, M.J., Povey, S. and Bruford, E.A., "HCOP: The HGNC comparison of orthology predictions search tool", Mammalian Genome, vol. 16, pp. 827-828, 2005.

[38]Penkett, C.J., Morris, J.A., Wood, V and Bahler, J., "YOGY: a web-based integrated database to retrieve protein orthologs and asscociated Gene Ontology terms", Nucleic Acids Research, vol. 34, W330-W334, 2006.

[39]Hu, Y., Flockhart, I., Vinayagam, A., Bergwitz, C., Berger, B., Perrimon, N. and Mohr, S.E., "An integrative approach to ortholog prediction for disease-focused and other functional studies", BMC Bioinformatics, vol. 12, no. 357, pp. 1-16, 2011.

[40]Hubbard, T.J., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., et al., Ensembl 2007. Nucleic Acids Res. vol. 35, pp. D610–D661, 2007.