Clustered Webbase: A Repository of Web Pages Based on Top Level Domain

Full Text (PDF, 409KB), PP.59-65

Views: 0 Downloads: 0

Author(s)

Geeta Rani 1,* Nidhi Tyagi 1

1. Shobhit University, Meerut, 250110, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2015.06.08

Received: 23 Aug. 2014 / Revised: 10 Jan. 2015 / Accepted: 26 Feb. 2015 / Published: 8 May 2015

Index Terms

URL, URI, TLD

Abstract

The World Wide Web is a huge source of hyperlinked information; it is growing every moment in context of web documents. So it has become an enormous challenge to manage the local repository (storage module of search engine) for to handling the web documents efficiently that leads to less access time of web documents and proper utilization of available resources. This research paper proposes an architecture of search engine with the clustered repository, organized in a better manner to make task easy for user to retrieving the web pages in reasonable amount of time. The research focuses on coordinator module which not only indexes the documents but also uses compression technique to increase the storage capacity of repository.

Cite This Paper

Geeta Rani, Nidhi Tyagi, "Clustered Webbase: A Repository of Web Pages Based on Top Level Domain", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.6, pp.59-65, 2015. DOI:10.5815/ijitcs.2015.06.08

Reference

[1]Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, International Conference on Computer networks and ISDN systems, pp. 107-177,1998.

[2]Ashutosh Dixit and Niraj Singhal,“ Web Crawling Techniques : A Review”, National Seminar on Information Security: Emerging Threats and Innovations in 21st Century, pp. 34-43, 2009.

[3]Nidhi Tyagi and Deepti Gupta, “A Novel Architecture for Domain Specific Parallel Crawler”, Indian Journal of Computer Science and Engineering, Vol 1 No 1 44-53,2010.

[4]Jun Hirai Sriram Raghavan Hector Garcia-Molina Andreas Paepcke,“WebBase : A repository of web pages”, Ninth International World Web Conference , pp. 277-293 ,2000.

[5]Gray, M., “Internet Statistics: Growth and Usage of the Web and the Internet”, http://www.mit.edu/people/ mkgray/net/,1996.

[6]Burner, M., Crawling towards Eternity: “Building An Archive of The World Wide Web”, Web Techniques Magazine, Vol. 2, No. 5, 37-40,1997.

[7]Shikha Maan and Mukesh Rawat “Design and Implementation of Specialized form Focused Crawler” , International Journal of Contemporary Research in Engineering & Technology, Vol. 3, No. 1&2,2013

[8]Kavita Saini “Extraction of Lables from Search Interfaces for Domain Specific HiWE”, International Journal of Contemporary Research in Engineering & Technology, Vol. 3, No. 1&2,2013

[9]Jun Hirai, Sriram Raghavan, Hector Garcia-Molina and Andreas Paepcke “WebBase : A repository of web pages”, Ninth International World Web Conference ,pp. 277-293 ,2000.

[10]Jinru He, Hao Yan, Torsten Suel, “Compact Full-Text Indexing of Versioned Document Collections ”, ACM conference on Information and knowledge management,pp. 415-424, 2009.

[11]Khaled M. Hammouda and Mohamed S. Kamel, “Efficient Phrase-Based Document Indexing for Web Document Clustering” , IEEE transactions on knowledge and data engineering, vol. 16, no. 10, pp. 1279-1296 , 2004.

[12]Pooja Mudgil, A. K. Sharma and Pooja Gupta “An Improved Indexing Mechanism to Index Web Documents”, International Conference on Computational Intelligence and Communication Networks, pp. 460-464, 2013.

[13]Geeta Rani and Nidhi Tyagi “A Cluster Balancing Approach for Efficient Storage of Web Documents” National conference on recent trends in advanced computing, electronics and information technology, CICON-2014.

[14]Cho, J., Garcia-Molina, H., Haveliwala, T., Lam, W.,Paepcke, A., Raghavan, S., Wesley, G., “WebBase Components and Applications”,ACM Transactions on Internet Technology,pp. 153-186 ,2006.

[15]Frank McCown and Michael L. Nelson, “Evaluation of Crawling Policies for a Web -Repository Crawler”, 17th conference on Hypertext and hypermedia, pp. 157-168,2006,

[16]Konstantin Shvachko, Hairong Kuang, Sanjay Radia and Robert Chansler, “The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage Systems and Technologies , pp. 1-10, 2010.

[17]Jeffrey Dean and Sanjay Ghemawat, “ MapReduce: Simplied Data Processing on Large Clusters”, OSDI , Vol 51,pp. 107-113 , 2004.

[18]Junghoo Cho and Hector Garcia-Molina” The Evolution of the Web and Implications for an Incremental Crawler”, International Conference on very large database, pp. 200-209, 2000.

[19]Burner, M., “Crawling towards Eternity: Building An Archive of The World Wide Web”, Web Techniques Magazine, Vol. 2, No. 5, pp. 37-40,1997.

[20]http://en.wikipedia.org/wiki/Domain-name

[21]Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsofios, WebGUIDE: Querying and Navigating Changes in Web Repositories, International Conference world wide web,pp. 6-10 , 1996.

[22]Lorna M. Campbell, Kerry Blinco and Jon Mason, “Repository Management and Implementation”, Altilab04-repositories ,11 July 2004.