Proximity Measurement Technique for Gene Expression Data

Full Text (PDF, 429KB), PP.40-48

Views: 0 Downloads: 0


Karuna Ghai 1,* Sanjay K. Malik 1

1. Deptt. Of CSE, Hindu College of Engg., Sonepat, Haryana-131001, India

* Corresponding author.


Received: 13 Jul. 2015 / Revised: 6 Aug. 2015 / Accepted: 2 Sep. 2015 / Published: 8 Oct. 2015

Index Terms

Data mining, microarray, gene expression data, hierarchical clustering


Data Mining is an analytical process intended to explore the data in search of consistent patterns. Due to its wide spread applications in biomedical industry and publicly available genomic data, data mining has become upcoming topic in the analysis of gene expression data. Clustering is the first step in understanding the complicated biological systems. The objective of clustering is to organize the samples into intrinsic clusters such that samples with high similarity belong to same cluster. The significance of clustering gene profiles is two-fold. Firstly, it assists in diagnosis of the disease condition and secondly it discloses the effect of certain treatment on genes. In this paper, we propose a new method to cluster gene expression data that is solely based on the concept of hierarchical clustering with a different method to compute the similarity between datasets and merge the pairs. The experimental results on two microarray data show the correctness and competence of proposed technique.

Cite This Paper

Karuna Ghai, Sanjay K. Malik, "Proximity Measurement Technique for Gene Expression Data", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.10, pp.40-48, 2015. DOI:10.5815/ijmecs.2015.10.06


[1]G.P. Shapiro and P. Tamayo, “Microarray Data Mining: Facing the Challenges”, SIGKDD Explorations, vol. 5, no. 3, pp. 1-5, 2003.
[2]A. Bellaachia, D. Portnoy, Y. Chen and A.G. Elkahloun, “E-CAST: A Data Mining Algorithm for Gene Expression Data”, Workshop on Data Mining in Bioinformatics, vol. 2, pp. 49-54, 2002.
[3]J. Li and H.L. Ong, “Feature Space Transformation for Better Understanding Biological and Medical Classifications”, Journal of Research and Practice in Information Technology, Vol. 36, 2004.
[4]Judice L.Y.Koh1, M.L. Lee, A.M. Khan, Paul T.J. Tan1 and V. Brusic, “Duplicate Detection in Biological Data using Association Rule Mining”, Proceedings of the 2nd European Workshop on Data Mining and Text Mining in Bioinformatics, pp. 35-41, 2004.
[5]M. Eisen, P. Spellman, P. Brown and D. Botstein, “Cluster Analysis and Display of Genome-wide Expression Patterns”, Proceedings of National Academic Science USA, vol. 95, pp. 14863-14868, 1998.
[6]L. Wang, F. Chu and W. Xie, “Accurate Cancer Classification Using Expressions of Very Few Genes”, IEEE Trans. Computational Biology and Bioinformatics, vol. 4, no. 1, pp. 40-53, 2007.
[7]Y. Cheng and G.M. Church, “Biclustering of Expression Data”, Proceedings of 8th International Conference Intelligent Systems for Molecular Biology (ISMB), vol. 8, pp. 93-103, 2000.
[8]C.A. Murthy and N. Chowdhury, “In Search of Optimal Clusters using Genetic Algorithms”, Pattern Recognition Letters, vol. 17, pp. 825–832, 1996.
[9]S. Paterlini and T. Krink, “Differential Evolution and Particle Swarm Optimization in Partitional Clustering”, Computational Statistics and Data Analysis, vol. 50, pp. 1220–1247, 2006.
[10]A.K. Pujari, Data mining Techniques, University Press, Hyderabad, 2002.
[11]Huang and K.N. Michael, “A fuzzy k-modes algorithm for clustering categorical data”, IEEE Transaction Fuzzy Systems, vol. 7, no. 4, pp. 446-452, 1999.
[12]D.T. Nguyen, “Clustering with Multiviewpoint-Based Similarity Measure”, IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 6, pp. 988-1001, 2012.
[13]P.A. Jaskowiak, R.J.G.B. Campello and I.G. Costa, “Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis”, IEEE Transaction on Computational Biology and Bioinformatics, vol. 10, no. 4, pp. 845-857, 2013.
[14]A. Thalamuthu, I. Mukhopadhyay, X. Zheng and G.C. Tseng, “Evaluation and Comparison of Gene Clustering Methods in Microarray Analysis”, Bioinformatics, vol. 22, no. 19, pp. 2405-2412, 2006.
[15]M. Xiong, W. Li, J. Zhao and E. Biorwinkle, “Feature (Gene) Selection in Gene Expression-Based Tumor Classification”, Molecular Genetics and Metabolism, vol. 73, no. 3, pp. 239-247, 2001.
[16]R. Xu and D. Wunsch, “Survey of Clustering Algorithms”, IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.
[17]A. Lingling and R.W. Doerge, “Dynamic Clustering of Gene Expression”, ISRN Bioinformatics, vol. 10, 2012.
[18]S. Nagi, D.K. Bhattacharyya and J.K. Kalita, “Gene Expression Data Clustering Analysis: A Survey”, Proceedings of 2nd IEEE National Conference on Emerging Trends and Applications in Computer Science (NCETACS), 2011.
[19]R. Gentleman, V. Carey and W. Huber, Bioinformatics and Computational Biology Solutions Using R and Bio conductor, USA, Springer, 2005.
[20]B. Bolstad, R. Irizarry, M. Astrand and T. Speed, “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data based on Variance and Bias”, Bioinformatics, vol. 19, pp. 185-93, 2003.
[21]I. Jeffery, D. Higgins and A. Culhane, “Comparison and Evaluation of Methods for Generating Differentially Expressed Gene list from Microarray Data”, Bioinformatics, vol. 7, pp. 359, 2000.
[22]I. Jeffery, D. Higgins and A. Culhane, “Comparison and evaluation of methods for generating differentially expressed gene list from microarray data”, BMC Bioinformatics, vol. 7, pp. 359, 2000.
[23]R. Bellazzi and B. Zupan, “Towards Knowledge-Based Gene expression Data Mining”, Journal of Biomedical Informatics, vol. 40, pp. 787-802, 2007.
[24]A.K. Jain, R.C. Dubes, Algorithms for clustering data, Prentice Hall, 1988.
[25]R. Sibson, “SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method”, The Computer Journal (British Computer Society), vol. 16 (1), pp. 30–34, 1973.
[26]D. Defays, “An Efficient Algorithm for a Complete Link Method”, The Computer Journal, vol. 20, pp. 364-366, 1977.
[27]P. Hancen and B. Jaumard, “Cluster analysis and mathematical programming”, Mathematical programming, vol. 79, pp. 191-215, 1997.
[28]A. Ben-Dor, N. Friedman and Z. Yakhini, “Clustering gene expression patterns”, Journal of Computational Biology, vol. 6, No. 3/4, pp. 281-297, 1999.
[29]M.G. Kendall, “A New Measure of Rank Correlation”, Biometrika, vol. 30, pp. 81–93, 1938.
[30]C. Spearman, “The Proof and Measurement of Association between Two Things”, American Journal of Psychology, vol. 15, pp. 72–101, 1904.
[31]H. Wang, W. Wang, Y. Wei, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets”, Proceedings of ACM SIGMOD International Conference Management of Data, pp. 394-405, 2002.
[32]D. Jiang, J. Pei, and A. Zhang, “DHC: A Density-Based Hierarchical Clustering Method for Time-Series Gene Expression Data”, Proceedings of BIBE2003: 3rd IEEE International Symposium Bioinformatics and Bioengineering, 2003.
[33]D.C. Vinh, “Autosomal Dominant and Sporadic Monocytopenia with Susceptibility to Mycobacteria, Fungi, Papillomaviruses and Myelodyslasia”, Blood, vol. 115, no. 8, pp. 1519-1529, 2010.