A Novel Web Page Change Detection Approach using Sql Server

Full Text (PDF, 583KB), PP.36-43

Views: 0 Downloads: 0

Author(s)

Md. Abu Kausar 1,* V. S. Dhaka 1 Sanjeev Kumar Singh 2

1. Dept. of Computer & System Sciences, Jaipur National University, Jaipur, India

2. Dept. of Mathematics, Galgotias University, Gr. Noida, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2015.09.05

Received: 14 Jun. 2015 / Revised: 21 Jul. 2015 / Accepted: 10 Aug. 2015 / Published: 8 Sep. 2015

Index Terms

Web Page change detection, Hash Value, Checksum, HTML

Abstract

The WWW is very dynamic in nature and is basically used for the exchange of information or data, the content of web page is added, changed or deleted continuously, which makes Web crawlers very challenging to keep refresh the page with the current version. The web page is frequently changes its content hence it becomes essential to develop an effective system which could detect these types of changes efficiently in the lowest browsing time to achieve this changes. In this chapter we compare the old web page hash value with the new web page hash value if hash value is changed that means web page content changed. The changes in web page can be detected by calculating the hash value which is unique.

Cite This Paper

Abu Kausar, V. S. Dhaka, Sanjeev Kumar Singh, "A Novel Web Page Change Detection Approach using Sql Server", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.9, pp.36-43, 2015. DOI:10.5815/ijmecs.2015.09.05

Reference

[1]Cho, Junghoo, Angeles, Los, and Garcia-Molina, Hector, “Effective Page Refresh Policies for Web Crawlers”, ACM Transactions on Database Systems, Volume 28, Issue 4, pp. 390 – 426, December 2003.
[2]Lawrence, S., and Giles, C. L., “Accessibility of information on the web”, Nature, 400:107-109, 1999.
[3]Cho, J. and H. Garcia-Molina. The evolution of the Web and implications for an incremental crawler. VLDB conference 2000, 200-209, 2000.
[4]Fetterly, D., M. Manasse, M. Najork, and J. Wiener. A large-scale study of the evolution of Web pages. WWW ‘03, 669-678, 2003.
[5]Kim, J. K., and S. H. Lee. An empirical study of the change of Web pages. APWeb ‘05, 632-642, 2005.
[6]Koehler, W. Web page change and persistence: A four-year longitudinal study. JASIST, 53(2), 162-171, 2002.
[7]Kwon, S. H., S. H. Lee, and S. J. Kim. Effective criteria for Web page changes. In Proceedings of APWeb ’06, 837-842, 2006.
[8]Ntoulas, A., Cho, J., and Olston, C. What’s new on the Web? The evolution of the Web from a search engine perspective. WWW ’04, 1-12, 2004.
[9]Pitkow, J. and Pirolli, P. Life, death, and lawfulness on the electronic frontier. CHI ’97, 383-390, 1997.
[10]Olston, C. and Pandey, S. Recrawl scheduling based on information longevity. WWW ’08, 437-446, 2008.
[11]Selberg, E. and Etzioni, O. On the instability of Web search engines. In Proceedings of RIAO ’00, 2000.
[12]Teevan, J., E. Adar, R. Jones, and M. A. Potts. Information reretrieval: repeat queries in Yahoo's logs. SIGIR ‘07, 151-158, 2007.
[13]Erwin Leonardi, Sourav S. Bhownick, “Detecting Content Changes on Ordered XML Documents Using Relational Databases”.
[14]F.P.Cubera, D.A. Epstein. “Fast Difference and Update of XML documents”, Xtech, san Jose, March 1999.
[15]H. Mayurama, K. Tamora, “Digest value for DOM (DOM hash) proposal”, IBM Tokyo Research Laboratory, http://www.tri.ibm.co.jp/projects/xml/domhash.htm, 1998.
[16]G. Cobena, S. Abiteboul, A. Marian, Detecting changes in XML documents, in: 18th International Conference on Data Engineering, San Jose, CA, 2002, pp. 41–52.
[17]Y. Wang, D. DeWitt, J. Cai, X-Diff: an effective change detection algorithm for XML documents, in: International Conference on Data Engineering, Bangalore, India, 2003, pp. 519–530.
[18]J. Jacob, A. Sache, S. Chakravarthy, CX-DIFF: a change detection algorithm for XML content and change visualization for WebVigiL, Data and Knowledge Engineering, Volume 52, Issue 2, pp. 209–230, 2005.
[19]S. Flesca, E. Masciari, Efficient and effective web change detection, Data and Knowledge Engineering, Volume 46, Issue 2, pp. 203–224, 2003.
[20]H. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics, Volume 2, Issue 1, pp. 7–21, 2005.
[21]I. Khoury, R. El-Mawas, O. El-Rawas, E. Mounayar, H. Artail, An efficient web page change detection system based on an optimized Hungarian algorithm, IEEE Transactions on Knowledge and Data Engineering, Volume 19, Issue 5, pp. 599–613, 2007.
[22]Francisco-Revilla, L., Shipman, F., Furuta, R., Karadkar, U. and Arora, A., “Managing Change on the Web”, In Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, pp. 67 – 76, 2001.
[23]P. L. Hegaret, R. Whitmer, and L. Wood, “W3C Document Object Model (DOM)”, June 2005. URL http://www.w3.org/DOM/.
[24]HTML 4.01 Specifications. http://www.w3.org/TR/html4/.