Automatically Extracting Name Alias of User from Email

Full Text (PDF, 258KB), PP.14-24

Views: 0 Downloads: 0

Author(s)

Meijuan Yin 1 Xiao Li 1 Junyong Luo 1 Xiaonan Liu 1 Yongxing Tan 1

1. Information Science and Technology Institute, Zhengzhou 450002, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijem.2011.06.03

Received: 20 Aug. 2011 / Revised: 26 Sep. 2011 / Accepted: 28 Oct. 2011 / Published: 5 Dec. 2011

Index Terms

Emails, Alias Extraction, Entity resolution, Salutation and signature blocks, Name boundary word template

Abstract

Mining user identity information from emails is an important research topic in email mining. Most approaches extract an email user's name only from the header of an email, but there are often many name information in the body of emails, which are usually more suitable for representing the sender's or recipient's identity. This paper focuses on the problem of extracting email users' name aliases in the body of plain-text emails. After locating and extracting salutation and signature blocks from email bodies, we can identify the potential aliases in the salutation and signature lines, which can be directly related with the email addresses in email headers, by using named entity recognition(NER) tools. To verify and amend the potential aliases that were identified by NER tools, we propose a novel approach to extract aliases in the salutation and signature lines based on name boundary word template built on the characteristics of alias neighboring words. Results on the public subset of the Enron corpus indicate that the approaches presented in this paper can efficiently extract user's aliases from email bodies.

Cite This Paper

Meijuan Yin,Xiao Li,Junyong Luo,Xiaonan Liu,Yongxing Tan,"Automatically Extracting Name Alias of User from Email", IJEM, vol.1, no.6, pp.14-24, 2011. DOI: 10.5815/ijem.2011.06.03 

Reference

[1] Indrajit Bhattacharya and Lise Getoor. A latent dirichlet model for unsupervised entity resolution. In The SIAM International Conference on Data Mining (SIAM-SDM),Bethesda, MD, USA, 2006.

[2] D. Bollegala, Y. Matsuo, and M. Ishizuka. Disambiguating personal names on the web using automatically extracted key phrases. In Proc. of the 17th European Conference on Artificial Intelligence, pages 553-557, 2006.

[3] D. Bollegara, Y. Matsuo, and M. Ishizuka. Extracting key phrases to disambiguate personal names on the web. In Proc. CICLing 2006, 2006.

[4] D. Bollegala, T. Honma. Identification of Personal Name Aliases on the Web[A]. In: Proceedings of WWW 2008 Workshop on Social Web Search and Mining(SWSM 2008). Beijing, China, 2008.

[5] D. Bollegala, T. Honma, Y. Matsuo, and M. Ishizuka. Mining for personal name aliases on the web. In: Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China.

[6] Christian Bird, Alex Gourley and Anand Swaminathan. Mining Email Social Networks[A]. In: Proceedings of the 2006 international workshop on Mining software repositories [C]. Shanghai, China, 2006: 137-143.

[7] Chris Diehl, Lise Getoor, and Galileo Namata. Name reference resolution in organizational email archives. In Proceddings of SIAM International Conference on Data Mining,Bethesda, MD , USA, April 20-22 2006.

[8] T. Elsayed, Oard D W. Modeling Identity in Archival Collections of Email[A]. In: Proceedings of the Third Conference on Email and Anti-Spam[C]. Mountain View, California, USA, 2006.

[9] T. Elsayed, D. W. Oard, and G. Namata. Resolving personal names in email using context expansion. In Association for Computational Linguistics(ACL), 2008.

[10] T. Elsayed, G. Namata, L. Getoor, and D. W. Oard. Personal name resolution in email: A heuristic approach. Technical Report UMIACS LAMP-TR-150, University of Maryland, March 2008.

[11] M. Yin, J. Luo, D. Cao, X. Liu and M. Li. Automatically locating salutation and signature blocks in emails[A]. To be published in: Proceedings of the 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'11) [C]. Shanghai, China, 2011. 

[12] Stanford University. Named Entity Recognition System [EB/OL]. http://nlp.stanford.edu/software/stanford-ner-2009-01-16.tgz. 2009.

[13] The email collection of Enron Corporation [DB/OL]. http://www.cs.cmu.edu/~enron/. 2003.