Semi-Supervised Personal Name Disambiguation Technique for the Web

Full Text (PDF, 291KB), PP.28-36

Views: 0 Downloads: 0

Author(s)

P.Selvaperumal 1,* A.Suruliandi 1

1. Manonmaniam Sundaranar University/Department of Computer science and Engineering, Tirunelveli, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2016.03.04

Received: 20 Nov. 2015 / Revised: 28 Dec. 2015 / Accepted: 23 Jan. 2016 / Published: 8 Mar. 2016

Index Terms

Personal Name disambiguation, Entity name disambiguation, Web page clustering

Abstract

Personal name ambiguity in the web arises when more than one person shares the same name. Personal name disambiguation involves disambiguating the name by clustering web page collection such that each cluster represents a person having the ambiguous name. In this paper, a personal name disambiguation technique that makes use of rich set of features like Nouns, Noun phrases, and frequent keywords as features is proposed. The proposed method consists of two phases namely clustering seed pages and then clustering the actual web page collection. In the first phase, seed pages representing different namesakes are clustered and in the second phase, web pages in the collection are clustered with the similar seed page clusters. The usage of seed pages increases the accuracy of clustering process. Since it is difficult to predict the number of clusters need to be formed beforehand, the proposed technique uses Elbow method to calculate the number of clusters. The efficiency of the proposed name disambiguation technique is tested using both synthetic and organic datasets. Experimental result shows the proposed method achieves robust results across different datasets and outperforms many existing methods.

Cite This Paper

P.Selvaperumal, A.Suruliandi, "Semi-Supervised Personal Name Disambiguation Technique for the Web", International Journal of Modern Education and Computer Science(IJMECS), Vol.8, No.3, pp.28-36, 2016. DOI:10.5815/ijmecs.2016.03.04

Reference

[1]Spink, Amanda, Bernard J. Jansen, and Jan Pedersen. "Searching for people on Web search engines." Journal of Documentation 60, no. 3 (2004): 266-278.
[2]Bollegala, Danushka, Yutaka Matsuo, and Mitsuru Ishizuka. "Automatic discovery of personal name aliases from the web." IEEE Transactions on Knowledge and Data Engineering 23, no. 6 (2011): 831-844.
[3]Bekkerman, R. "Name data set." (2005).
[4]Liu, Zhengzhong, Qin Lu, and Jian Xu. "High performance clustering for web person name disambiguation using topic capturing." Ratio (2011).
[5]J. Artiles, J. Gonzalo, and F. Verdejo, “A Testbed for People Searching Strategies in the WWW,” Proc. SIGIR, 2005.
[6]Artiles, Javier, Julio Gonzalo, and Satoshi Sekine. "The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task." In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 64-69. Association for Computational Linguistics, 2007.
[7]V. Hatzivassiloglou, P. Duboue, and A. Rzhetsky. Disambiguating proteins, genes,and rna in text: A machine learning approach. In Proceedings of the 9th International Conference on Intelligent Systems for Molecular Biology, Tivoli Gardens, Denmark, July 2001.
[8]Bekkerman, Ron, and Andrew McCallum. "Disambiguating web appearances of people in a social network." In Proceedings of the 14th international conference on World Wide Web, pp. 463-470. ACM, 2005.
[9]Fleischman, Michael B., and Eduard Hovy. "Multi-document person name resolution." In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Reference Resolution Workshop, pp. 66-82. 2004.
[10]An, Ning, Lili Jiang, Jianyong Wang, Ping Luo, Min Wang, and Bing Nan Li. "Toward detection of aliases without string similarity." Information Sciences 261 (2014): 89-100.
[11]http://nlp.uned.es/weps
[12]http://start.csail.mit.edu/index.php
[13]Matsuo, Yutaka, Junichiro Mori, Masahiro Hamasaki, Takuichi Nishimura, Hideaki Takeda, Koiti Hasida, and Mitsuru Ishizuka. "POLYPHONET: an advanced social network extraction system from the web." Web Semantics: Science, Services and Agents on the World Wide Web 5, no. 4 (2007): 262-278.
[14]Chen, Ying, S. Yat Mei Lee, and Chu-Ren Huang. "Polyuhk: A robust information extraction system for web personal names." In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. 2009.
[15]Bagga, A., & Baldwin, B. 1998. Entity-based cross-document coreferencing using the vector space model. Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1, 79–85. Association for Computational Linguistics.
[16]Mann, Gideon S., and David Yarowsky. "Unsupervised personal name disambiguation." In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pp. 33-40. Association for Computational Linguistics, 2003.
[17]Popescu, Octavian, and Bernardo Magnini. "Irst-bp: Web people search using name entities." In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 195-198. Association for Computational Linguistics, 2007.
[18]Cucerzan, Silviu. "Large-Scale Named Entity Disambiguation Based on Wikipedia Data." In EMNLP-CoNLL, vol. 7, pp. 708-716. 2007.
[19]Bunescu, Razvan C., and Marius Pasca. "Using Encyclopedic Knowledge for Named entity Disambiguation." In EACL, vol. 6, pp. 9-16. 2006.
[20]Ted Pedersen, Amruta Purandare, Anagha Kulkarni. Name Discrimination by Clustering Similar Contexts. In Proceedings of CICLing, 2005.
[21]Chong Long, and Lei Shi. "Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets." In CLEF (Notebook Papers/LABs/Workshops). 2010.
[22]Vu, Quang Minh, Atsuhiro Takasu, and Jun Adachi. "Improving the performance of personal name disambiguation using web directories." Information Processing & Management 44, no. 4 (2008): 1546-1561.
[23]Lu, Zhao, Zhixian Yan, and Liang He. "OnPerDis: Ontology-Based Personal Name Disambiguation on the Web." In Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, pp. 185-192. IEEE, 2013.
[24]Ikeda, Masaki, Shingo Ono, Issei Sato, Minoru Yoshida, and Hiroshi Nakagawa. "Person name disambiguation on the web by two-stage clustering." In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. 2009.
[25]Nuray-Turan, Rabia, Dmitri V. Kalashnikov, and Sharad Mehrotra. "Exploiting web querying for web people search." ACM Transactions on Database Systems (TODS) 37, no. 1 (2012): 7.
[26]Han, Xianpei, and Jun Zhao. "CASIANED: web personal name disambiguation based on professional categorization." In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference, pp. 2-5. 2009.
[27]Sugiyama, Kazunari, and Manabu Okumura. "Titpi: Web people search task using semi-supervised clustering approach." In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 318-321. Association for Computational Linguistics, 2007.
[28]Elmacioglu, Ergin, Yee Fan Tan, Su Yan, Min-Yen Kan, and Dongwon Lee. "Psnus: Web people name disambiguation by simple clustering with rich features." In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 268-271. Association for Computational Linguistics, 2007.
[29]Zhang, Duo, Jie Tang, Juanzi Li, and Kehong Wang. "A constraint-based probabilistic framework for name disambiguation." In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 1019-1022. ACM, 2007.
[30]GUHA, R., and A. GARG. 2004. Disambiguating people in search. In 13thWorldWideWeb Conference, Stanford University Stanford, CA.
[31]Minkov, Einat, William W. Cohen, and Andrew Y. Ng. "Contextual search and name disambiguation in email using graphs." In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 27-34. ACM, 2006.
[32]Lee Ingyu, Byung-Won On, and Seong No Yoon. "Algebraic Algorithms to Solve Name Disambiguation Problem." In DMIN, pp. 468-474. 2009.
[33]Lu, Yiming, Zaiqing Nie, Taoyuan Cheng, Ying Gao, and Ji-Rong Wen. "Name disambiguation using Web connection." In Proc. of AAAI. 2007.
[34]Bollegala, Danushka, Yutaka Matsuo, and Mitsuru Ishizuka. "Automatic annotation of ambiguous personal names on the web." Computational Intelligence 28, no. 3 (2012): 398-425.
[35]Balog, K., Azzopardi, L., & Rijke, M. de. 2005. Resolving person names in web people search. Weaving services and people on the World Wide Web, 301–323.
[36]Wan, Xiaojun, Jianfeng Gao, Mu Li, and Binggong Ding. "Person resolution in person search results: Webhawk." In Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 163-170. ACM, 2005.
[37]Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge university press, 2008.
[38]Amigó, Enrique, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. "A comparison of extrinsic clustering evaluation metrics based on formal constraints." Information retrieval 12, no. 4 (2009): 461-486.