Prediction of Drought Resistance Gene with Clustered Amino Acid Features

Full Text (PDF, 270KB), PP.62-67

Views: 0 Downloads: 0

Author(s)

Xia Jingbo 1,2 Shi Feng 1,2,* Hu Xuehai 1,2 Li Zhi 1,2 Song Chaohong 1,2 Xiong Huijuan 1,2

1. College of science, Huazhong agricultural university, Wuhan, P.R. China

2. Institute of applied mathematics, Huazhong agricultural university, Wuhan, P.R. China

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2012.11.07

Received: 13 Feb. 2012 / Revised: 4 Jun. 2012 / Accepted: 20 Aug. 2012 / Published: 8 Oct. 2012

Index Terms

Support Vector Machine, Classifier, Amino Acid Composition, K-Means

Abstract

Drought resistant gene plays important role in molecular breeding while little is known for its genetic mechanism. By extracting the clustered amino acids features, crucial numerical features are inferred for the resistance property of the given gene. Support vector machine algorithm is used to testify the reliability of feature extraction method. After carefully parameters choosing, the accuracy of the predictor achieves 79.36% in Jack-knife test, and the Mathews correlation coefficient achieves 0.5636.

Cite This Paper

Xia Jingbo, Shi Feng, Hu Xuehai, Li Zhi, Song Chaohong, Xiong Huijuan, "Prediction of Drought Resistance Gene with Clustered Amino Acid Features", International Journal of Intelligent Systems and Applications(IJISA), vol.4, no.11, pp.62-67, 2012. DOI:10.5815/ijisa.2012.11.07

Reference

[1]Golbashy, M.; Ebrahimi, M.; Khorasani, S.K.; Choukan, R. Evaluation of drought tolerance of some corn (Zea mays L.) hybrids in Iran. African Journal of Agricultural Research. 2010, 5(19): 2714-2719.

[2]Kandaswamy, K.K.; Chou, K.C.; Martinetz, T.; Möller, S.; Suganthan, P.N.; Sridharan, S.; Pugalenthi, G. AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. Journal of Theoretical Biology. 2011, 270(1):56-62.

[3]Yuan, Z.M.; Tan, X.S. Nonlinear Screening Indexes of Drought Resistance at Rice Seedling Stage Based on Support Vector Machine. Acta Agronomica Sinica. 2010, 36(7): 1176−1182.

[4]Tuberosa, R.; Salvi, S. Genomics-based approaches to improve drought tolerance of crops. Trends in Plant Science. 2006, 11(8): 405-412.

[5]Yuan, G.F.; Jia, C.G.; Li, Z.; Sun, B.; Zhang, L.P.; Liu, N.; Wang, Q.M. Effect of brassinosteroids on drought resistance and abscisic acid concentration in tomato under water stress. Scientia Horticulturae. 2010, 126(2): 103-108.

[6]Nishimura, N.; Hitomi, K.; Arvai, A.S.; Rambo, R.R.; Hitomi, C.; Cutler, S.R.; Schroeder, J.I; Getzoff, E.D. Structural Mechanism of Abscisic Acid Binding and Signaling by Dimeric PYR1. Science. 2009, 326: 1373-1379.

[7]Salgado, J.C.; Rapaport, I.; Juan A. Asenjo Prediction of retention times of proteins in hydrophobic interaction chromatography using only their amino acid composition. Journal of Chromatography A, 2005, 1098, 1-2(9):44-54.

[8]Zuo, Y.C.; Li, Q.Z. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides, 2009, 30(10): 1788-1793.

[9]Liu, T.; Zheng, X.; Wang, J. Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. Biochimie. 2010, 92(10): 1330-1334.

[10]Zakeri, P.; Moshiri, B.; Sadeghi, M. Prediction of protein submitochondria locations based on data fusion of various features of sequences. Journal of Theoretical Biology. 2011, 269(1): 208-216. 

[11]Shen, H. B.; Chou, K.C. PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry. 2008, 373: 386-388.

[12]Li, W.; Jaroszewski, L.; Odzik, G.A. Clustering of highly homologous sequences to reduce the size of large protein database. Bioinformatics. 2011, 17, 282–283.

[13]MacQueen, J.B. Some methods for classification and analysis of multivariate observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, 1: 281-297. 

[14]Cortes, C.; Vapnik,V. Support vector networks. Machine Learning. 1995, 20(3): 273-297.

[15]Hopp, T.P.; Woods, K.R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. 1981, 78: 3824-3828.

[16]Deléage, G.; Geourjon, C. ANTHEPROT: A software to display and analyze 3D NMR structures. J. Trace and Microprobe Techniques. 1995, 13: 337-338.