Performance Analysis of Schedulers to Handle Multi Jobs in Hadoop Cluster

Full Text (PDF, 443KB), PP.51-56

Views: 0 Downloads: 0

Author(s)

Guru Prasad M S 1,* Nagesh H R 2 Swathi Prabhu 3

1. SDMIT/CSE, Ujire, 574240, India

2. MITE/CSE, Moodabidri, 574227, India

3. SMVITM/CSE.Udupi, 576115, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2015.12.07

Received: 15 Sep. 2015 / Revised: 12 Oct. 2015 / Accepted: 3 Nov. 2015 / Published: 8 Dec. 2015

Index Terms

BigData, Apache Hadoop, MapReduce Framework, Hadoop Schedulers, Job Execution Time, Ganglia tool

Abstract

MapReduce is programming model to process the large set of data. Apache Hadoop an implementation of MapReduce has been developed to process the Big Data. Hadoop Cluster sharing introduces few challenges such as scheduling the jobs, processing data locality, efficient resource usage, fair usage of resources, fault tolerance. Accordingly, we focused on a job scheduling system in Hadoop in order to achieve efficiency. Schedulers are responsible for doing task assignment. When a user submits a job, it will move to a job queue. From the job queue, the job will be divided into tasks and distributed to different nodes. By the proper assignment of tasks, job completion time will reduce. This can ensure better performance of the jobs. By default, Hadoop uses the FIFO scheduler. In our experiment, we are discussing and comparing FIFO scheduler with Fair scheduler and Capacity scheduler job execution time.

Cite This Paper

Guru Prasad M S, Nagesh H R, Swathi Prabhu, "Performance Analysis of Schedulers to Handle Multi Jobs in Hadoop Cluster", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.12, pp.51-56, 2015. DOI:10.5815/ijmecs.2015.12.07

Reference

[1]V. Kalavri, V. Vlassov, “MapReduce: Limitations, Optimizations and Open Issues”, 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 2013, DOI10.1109/TrustCom.2013.126, pp.1031-1038.
[2]Apache Hadoop. http://hadoop.apache.org.(2012, Aug).
[3]Fair scheduler [online] http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html.
[4]Capacity scheduler http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
[5]NASA weblog dataset http://ita.ee.lbl.gov/html/contrib/NASAHTTP.html.
[6]Amazon data http://snap.stanford.edu/data/#amazon.
[7]Tom White. 2010. “Hadoop: The Definitive Guide” (Second edition).O'Reilly Media/Yahoo Press.
[8]V. Kalavri, V. Vlassov, “MapReduce: Limitations, Optimizations and Open Issues”, 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 2013, DOI10.1109/TrustCom.2013.126, pp.1031-1038.
[9]Aditya B. Patel, Manashvi Birla and Ushma Nair “Addressing Big Data Problem Using Hadoop and Map Reduce”, 2012 nirma university international conference on engineering, nuicone-2012, 06-08december, 2012.
[10]D. Wu, A. Gokhale,” A Self-Tuning System based on Application Profiling and Performance Analysis for Optimizing Hadoop MapReduce Cluster Configuration” IEEE conference,2013, pp. 89-98.
[11]Yao et al., 2013, “Scheduling Heterogeneous MapReduce Jobs for Efficiency Improvement in Enterprise Clusters”.
[12]Deshmukh et al., 2013, “Job Classification for MapReduce Scheduler in Heterogeneous Environment”, 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies.
[13]Chauhan et.al, 2012, “The Impact of Capacity Scheduler Configuration Setting on MapReduce Jobs”, 2012 Second International Conference on Cloud and Green Computing, 978-0-7695-4864-7/12 $26.00 © 2012 IEEE, pp 667-674.
[14]Zhao et al., 2012, “TDWS: a Job Scheduling Algorithm based on MapReduce”, 2012 IEEE Seventh International Conference on Networking, Architecture and Storage.
[15]Shi et al., 2011, “S3: An Efficient Shared Scan Scheduler on MapReduce Framework”, 2011 International Conference on Parallel Processing.
[16]Liu et al., 2013, “Evaluating Task Scheduling in Hadoop-based Cloud Systems” 2013 IEEE International Conference on Big Data.
[17]For Ganglia toolhttp://ganglia.sourceforge.net/.
[18]“Big data: the next frontier for innovation, competition, and productivity,” McKinsey Global Insititute, http://www.mckinsey.com/insights/mgi/research/technology and innovation/big data the next frontier for innovation, June 2011.
[19]Intel peer research: Big data analysis, intel’s it manager survey on how organizations are using the big data,” http://www.intel.com/ content/www/us/en/big-data/data-insights-peer-research-report.html, Auguest 2012.
[20]K. Shim, “Mapreduce algorithms for big data analysis,” in Proceedings of the VLDB Endowment. VLDB, pp. 2016– 2017, Auguest 2012.
[21]J. Dittrich and J.-A. Quian′e-Ruiz, “Efficient big data processing in hadoop mapreduce,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2014–2015, Auguest 2012.
[22]J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Comunications of the ACM, vol. 53, no. 1, pp. 72–77, January 2010.
[23]M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for, in-memory cluster computing,” in Proc. 9th USENIX Conf. Netw. Syst. Des. Implementation, 2012, p. 2.
[24]J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox, “Twister: A runtime for iterative mapreduce,” in Proc. 19th ACM Symp. High Performance Distributed Comput., 2010, pp. 810–818.