High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework

Full Text (PDF, 509KB), PP.75-84

Views: 0 Downloads: 0

Author(s)

Guru Prasad M S 1,* Nagesh H R 2 Swathi Prabhu 3

1. SDMIT/CSE, Ujire, 577240, India

2. MITE/CSE, Moodbidri, 574227, India

3. SMVITM/CSE, Udupi, 576115, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.01.08

Received: 6 Feb. 2016 / Revised: 1 Jun. 2016 / Accepted: 15 Aug. 2016 / Published: 8 Jan. 2017

Index Terms

Big Data, Hadoop, MapReduce, Hadoop Distributed File System (HDFS), Apriori MapReduce, FP-growth MapReduce

Abstract

The Huge amount of Big Data is constantly arriving with the rapid development of business organizations and they are interested in extracting knowledgeable information from collected data. Frequent item mining of Big Data helps with business decision and to provide high quality service. The result of traditional frequent item set mining algorithm on Big Data is not an effective way which leads to high computation time. An Apache Hadoop MapReduce is the most popular data intensive distributed computing framework for large scale data applications such as data mining. In this paper, the author identifies the factors affecting on the performance of frequent item mining algorithm based on Hadoop MapReduce technology and proposed an approach for optimizing the performance of large scale frequent item set mining. The Experiments result shows the potential of the proposed approach. Performance is significantly optimized for large scale data mining in MapReduce technique. The author believes that it has a valuable contribution in the high performance computing of Big Data.

Cite This Paper

Guru Prasad M S, Nagesh H R, Swathi Prabhu,"High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.1, pp.75-84, 2017. DOI:10.5815/ijisa.2017.01.08

Reference

[1]Hui Chen , Tsau Young Lin, Zhibing Zhang and Jie Zhong (2013) “Parallel Mining Frequent Patterns over Big Transactional Data in Extended MapReduce ”, 2013 IEEE International Conference on Granular Computing (GrC), pp.43-48.
[2]Zahra Farzanyar, Nick Cercone(2013) “ Efficient Mining of Frequent itemsets in Social Network Data based on MapReduce Framework”, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining,pp.1183-1188.
[3]Yanfeng Zhang, Shimin Chen, Qiang Wang, and Ge Yu (2015) “i2 MapReduce: Incremental MapReduce for Mining Evolving Big Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27,NO. 7,pp.1906-1919.
[4]LI Bing and LI Bing (2014) “A Paralleled Big Data Algorithm with MapReduce Framework for Mining Twitter Data”, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing,pp.121-128.
[5]Sheela Gole and Bharat Tidke (2015) “Frequent Itemset Mining for Big Data in social media using ClustBigFIM algorithm”, 2015 IEEEInternational Conference on Pervasive Computing (ICPC), pp.1-6.
[6]Yen-hui Liang and Shiow-yang Wu (2015) “ Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework”, 2015 IEEE International Congress on Big Data, pp.393-400.
[7]Zhuobo Rong, DawenXia and Zili Zhang (2013) “Complex Statistical Analysis of Big Data: Implementation and Application of Apriori and FP¬ Growth Algorithm Based on MapReduce”. 2013 IEEE conference, pp.968-972.
[8]Pekka Paakkonen and Daniel Pakkala (2015) “Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems”, Elsevier Big Data Research, pp.166-186.
[9]Arantxa Duque Barrachina and Aisling O’Driscoll (2014) “A big data methodology for categorizing technical support requests using Hadoop and Mahout”, Journal of Big Data, pp. 1-11.
[10]Matthew Herland, Taghi M Khoshgoftaar and Randall Wald (2014) “A review of data mining using big data in health informatics”, Journal of Big Data, pp. 1-35.
[11]Dilpreet Singh and Chandan K Reddy (2014) “A survey on platforms for big data analytics”, Journal of Big Data, pp. 1-20.
[12]J. Dean and S. Ghemawat, (2004) “Mapreduce: Simplified data processing on large clusters,” in Proc. 6th Conf. Symp. Opear. Syst. Des.Implementation, p. 10-18.
[13]A. Labrinidis and H. V. Jagadish (2012) “Challenges and opportunities with big data,” in Proceedings of the VLDB Endowment. VLDB, pp. 2032–2033.
[14]J. Han, J. Pei, and Y. Yin (2007) “Mining frequent patterns without candidate generation,” in Proceedings of the 9th international conference on Parallel Computing Technologies, pp. 623–631.
[15]R. Agrawal and R. Srikant (1994) “Fast algorithms for mining association rules in large databases,” in Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499.
[16]J. H. Chang and W. S. Lee (2003) “Finding recent frequent itemsets adaptively over online data streams,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487–492.
[17]Le Zhou; Zhiyong Zhong; Jin Chang; Junjie Li; Huang, J.Z,Shengzhong Feng, "Balanced parallel FP-Growth with MapReduce," Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on , pp.28-30
[18]Li N., Zeng L., He Q. & Shi Z. (2012). Parallel Implementation of Apriori Algorithm Based on MapReduce. Proc. of the 13 ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD’12). Kyoto, IEEE: 236 – 241.
[19]Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh (2012) “Apriori-based frequent itemset mining algorithms on MapReduce”. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC '12). ACM, New York, pp 86-93