An Optimization Model and DPSO-EDA for Document Summarization

Full Text (PDF, 213KB), PP.59-68

Views: 0 Downloads: 0

Author(s)

Rasim M. Alguliev 1,* Ramiz M. Aliguliyev 1 Chingiz A. Mehdiyev 1

1. Institute of Information Technology of Azerbaijan National Academy of Sciences

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2011.05.08

Received: 10 Jan. 2011 / Revised: 6 Apr. 2011 / Accepted: 27 Jun. 2011 / Published: 8 Nov. 2011

Index Terms

Generic summarization, optimization model, balancing coverage and diversity, Heronian mean, discrete particle swarm optimization, estimation of distribution algorithm

Abstract

We model document summarization as a nonlinear 0-1 programming problem where an objective function is defined as Heronian mean of the objective functions enforcing the coverage and diversity. The proposed model implemented on a multi-document summarization task. Experiments on DUC2001 and DUC2002 datasets showed that the proposed model outperforms the other summarization methods.

Cite This Paper

Rasim M. Alguliev, Ramiz M. Aliguliyev, Chingiz A. Mehdiyev, "An Optimization Model and DPSO-EDA for Document Summarization", International Journal of Information Technology and Computer Science(IJITCS), vol.3, no.5, pp.59-68, 2011. DOI:10.5815/ijitcs.2011.05.08

Reference

[1]H. Nguyen, E. Santos, and J. Russell, Evaluation of the impact of user-cognitive styles on the assessment of text summarization, IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 2011, 10.1109/TSMCA.2011.2116001.

[2]I. Mani and M.T. Maybury, Advances in automatic text summarization, MIT Press, Cambridge, 1999, 442p.

[3]Y. Ouyang, W. Li, S. Li, and Q. Lu, Applying regression models to query-focused multi-document summarization, Information Processing & Management, 2011, vol.47, no.2, pp.227–237.

[4]C.R. Chowdary, M. Sravanthi, and P.S. Kumar, A system for query specific coherent text multi-document summarization, International Journal on Artificial Intelligence Tools, 2010, vol.19, no.5, pp.597–626.

[5]X. Wan and J. Xiao, Exploiting neighborhood knowledge for single document summarization and keyphrase extraction, ACM Transactions on Information Systems, 2010, vol.28, no.2, Article 8, 34p.

[6]X. Wan, Using only cross-document relationships for both generic and topic-focused multi-document summarizations, Information Retrieval, 2008, vol.11, no.1, pp.25‒49.

[7]M. Kutlu, C. Cigir, and I. Cicekli, Generic text summarization for Turkish, The Computer Journal, 2010, vol.53, no.8, pp.1315-1323.

[8]X. Wan, An exploration of document impact on graph-based multi-document summarization, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, October 25-27, 2008, pp.755–762.

[9]X. Wan and J. Yang, Multi-document summarization using cluster-based link analysis, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, July 20–24, 2008, pp.299-306.

[10]Y. Ouyang, W. Li, S. Li, and Q. Lu, Intertopic information mining for query-based summarization, Journal of the American Society for Information Science and Technology, 2010, vol.61, no.5, pp.1062–1072.

[11]F. Wei, W. Li, and Y. He, Document-aware graph models for query-oriented multi-document summarization, Studies in Computational Intelligence, 2011, vol.346, pp.655–678.

[12]C. Otani, M.K. Hoo, Y. Oda, T. Furue, Y. Uchida, and O. Yoshie, Query-biased summarization considering difference of paragraphs, IEEJ Transactions on Electronics, Information and Systems, 2010, vol.130, no.12, pp.2256-2265+21.

[13]C. Long, M.-L. Huang, X.-Y. Zhu, and M. Li, A new approach for multi-document update summarization, Journal of Computer Science and Technology, 2010, vol.25, no.4, pp.739–749.

[14]D. Wang and T. Li, Document update summarization using incremental hierarchical clustering, Proceedings of the ACM 19th Conference on Information and Knowledge Management, Toronto, Canada, October 26–30, 2010, pp.279–287.

[15]R.M. Alguliev and R.M. Aligulyiev, Automatic text documents summarization through sentences clustering, Journal of Automation and Information Sciences, 2008, vol.40, no.9, pp.53‒63.

[16]R.M. Aliguliyev, A new sentence similarity measure and sentence based extractive technique for automatic text summarization, Expert Systems with Applications, 2009, vol.36, no.4, pp.7764‒7772.

[17]R.M. Aliguliyev, Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization, Computational Intelligence, 2010, vol.26, no.4, pp.420-448.

[18]R.M. Alguliev and R.M. Aliguliyev Evolutionary algorithm for extractive text summarization, Intelligent Information Management, 2009, vol.1, no.2, pp.128–138.

[19]R.M. Aliguliyev, The two-stage unsupervised approach to multidocument summarization, Automatic Control and Computer Sciences, 2009, vol.43, no.5, pp.276–284.

[20]E. Filatova and V. Hatzivassiloglou, A formal model for information selection in multi-sentence text extraction, Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, August 23-27, 2004, pp.397-403.

[21]R. McDonald, A study of global inference algorithms in multi-document summarization, Proceedings of 29th European Conference on IR Research, Rome, Italy, April 2‒5, 2007, Springer-Verlag, LNCS, 2007, no.4425, pp.557‒564.

[22]H. Takamura and M. Okumura, Text summarization model based on maximum coverage problem and its variant, Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, March 30 ‒ April 3, 2009, pp.781‒789.

[23]J.C.K. Cheung, G. Carenini and R.T. Ng, Optimization-based content selection for opinion summarization, Proceedings of the 2009 Workshop on Language Generation and Summarization (ACL-IJCNLP), Singapore, 6 August 2009, pp.7–14.

[24]K. Riedhammer, B. Favre, and D. Hakkani-Tür, Long story short – global unsupervised models for keyphrase based meeting summarization, Speech Communication, 2010, vol.52, no.10, pp.801–815.

[25]J. Wang, Y. Cai, Y. Zhou, R. Wang, and C. Li, Discrete particle swarm optimization based on estimation of distribution for terminal assignment problems, Computers & Industrial Engineering, 2011, vol.60, no.4, pp.566–575.

[26]R. Poli, J. Kennedy, and T. Blackwell, Particle swarm optimization: an overview, Swarm Intelligence, 2007, vol.1, no.1, pp.33–57. 

[27]A. Ratnaweera, S.K. Halgamuge, and H.C. Watson, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients, IEEE Transactions on Evolutionary Computation, 2004, vol.8, no.3, pp.240–255.

[28]J. Kennedy and R. Eberhart, Particle swarm optimization, Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, 27 November – 01 December 1995, vol.4, pp.1942–1948.

[29]D. Martens, B. Baesens, and T. Fawcett, Editorial survey: swarm intelligence for data mining, Machine Learning, 2011, vol.82, no.1, pp.1–42.

[30]S. Baluja, Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning. School of Computer Science, Carnegie Mellon University, Pittsburgh, Technical Report CMU-CS-94-163, 41pp.

[31]DUC2001: HHUUhttp://www-nlpir.nist.gov/projects/duc/guidelines/2001.htmlUU

[32]DUC2002: HHUUhttp://www-nlpir.nist.gov/projects/duc/guidelines/2002.htmlUU

[33]Porter Stemming Algorithm: HHUUhttp://www.tartarus.org/martin/PorterStemmer/UU

[34]English stoplist: HHUUftp://ftp.cs.cornell.edu/pub/smart/english.stopUUHH 

[35]C.-Y. Lin and E. Hovy, Automatic evaluation of summaries using n-gram co-occurrence statistics, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language, Edmonton, Canada, May 27 – June 1, 2003, vol.1, pp.71–78.

[36]X. Wan, Towards a unified approach to simultaneous single-document and multi-document summarizations, Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, August 23-27, 2010, pp.1137–1145.