Textual Coherence Improvement of Extractive Document Summarization Using Greedy Approach and Word Vectors

Full Text (PDF, 565KB), PP.23-31

Views: 0 Downloads: 0

Author(s)

Mohamad Abdolahi 1,* Morteza Zahedi 1

1. Kharazmi International Campus Shahrood University Shahrood, Iran

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2019.04.03

Received: 4 Jan. 2019 / Revised: 29 Jan. 2019 / Accepted: 26 Feb. 2019 / Published: 8 Apr. 2019

Index Terms

Natural language processing, Extractive summarization, Text coherence, Word vector, Language models

Abstract

There is a growing body of attention to importance of document summarization in most NLP tasks. So far, full coverage information, coherence of output sentences and lack of similar sentences (non-redundancy) are the main challenges faced to many experiments in compacted summaries. Although some research has been carried out on compact summaries, there have been few empirical investigations into coherence of output sentences. The aim of this essay is to explore a comprehensive and useful methodology to generate coherent summaries. The methodological approach taken in this study is a mixed method based on most likely n-grams and word2vec algorithm to convert separated sentences into numeric and normalized matrices. This paper attempts to extract statistical properties from numeric matrices. Using a greedy approach, the most relevant sentences to main document subject are selected and placed in the output summary. The proposed greedy method is our backbone algorithm, which utilizes a repeatable algorithm, maximizes two features of conceptual coherence and subject matter diversity in the summary. Suggested approach compares its result to similar model Q_Network and shows the superiority of its algorithm in confronting with long text document.

Cite This Paper

Mohamad Abdolahi, Morteza Zahedi, "Textual Coherence Improvement of Extractive Document Summarization Using Greedy Approach and Word Vectors", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.4, pp. 23-31, 2019.DOI: 10.5815/ijmecs.2019.04.03

Reference

[1]L. Alonso, i Alemany and M. F. Fort, "Integrating cohesion and coherence for Automatic Summarization," in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, Vol. 2, 2003, pp. 1-8, 2003.
[2]M. Abdolahi and M. Zahedi, "An overview on text coherence methods," Eighth International Conference on Information and Knowledge Technology (IKT), pp. 1-5, 2016.
[3]M. A. K. Halliday and R. Hasan, Cohesion in English, Longman Group Limited London, 1976.
[4]H. P. Luhn, "A business intelligence system," IBM Journal of research and development, vol. 2, no. 4, pp. 314-319, 1958.
[5]H. P. Edmundson, "New methods in automatic extracting," Journal of the ACM (JACM), vol. 16, no. 2, pp. 264-285, 1969.
[6]N. Ramanujam and M. Kaliappan, "An automatic multidocument text summarization approach based on Naive Bayesian classifier using timestamp strategy," The Scientific World Journal, vol. pp.1-15, 2016.
[7]A. P. Louis, "A Bayesian Method to incorporate background knowledge during automatic text summarization," Association for Computational Linguistics, pp.333-338, 2014.
[8]N. Alami, M. Meknassi, and N. Rais, "Automatic texts summarization: Current state of the art," Journal of Asian Scientific Research, vol. 5, no. 1, pp. 1-15, 2015.
[9]J. P. Verma and A. Patel, "Evaluation of Unsupervised Learning based Extractive Text Summarization Technique for Large Scale Review and Feedback Data," Indian Journal of Science and Technology, vol. 10, no. 17, pp.1-6, 2017.
[10]Y. Gong and X. Liu, "Generic text summarization using relevance measure and latent semantic analysis," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-25, 2001.
[11]J. Steinberger and K. Jezek, "Using latent semantic analysis in text summarization and summary evaluation," Proc ISIM, vol. 4, pp. 93-100, 2004.
[12]R. Barzilay and M. Elhadad, "Using lexical chains for text summarization," Advances in automatic text summarization, pp. 111-121, 1999.
[13]M. Pourvali and M. S. Abadeh, "Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base," arXiv preprint arXiv:1203.3586, 2012.
[14]C. Santos, "Alexia-acquisition of lexical chains for text summarization," University 0f Beira Interior, Covilhã, Portugal ,Citeseer, 2006.
[15]P. Jain and S. Jain, "Summarizing Text Using Lexical Chains." International Journal on Recent and Innovation Trends in Computing and Communication. No.4, vol. 4, 2016.
[16]A. A. Natesh, S. TBalekuttira and A. P. Patil, “Graph based approach for automatic text summarization”, International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), no.5, vol. 2, 2016.
[17]R. Mihalcea, "Graph-based ranking algorithms for sentence extraction, applied to text summarization," in Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, 2004.
[18]J. Christensen, S. Soderland, and O. Etzioni, "Towards coherent multi-document summarization," in Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp. 1163-1173, 2013.
[19]D. Parveen, M. Mesgar, and M. Strube, "Generating coherent summaries of scientific articles using coherence patterns," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 772-783, 2016.
[20]C. Guinaudeau and M. Strube, "Graph-based local coherence modeling," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 93-103, 2013.
[21]C. E. Crangle, "Text summarization in data mining," in Soft-Ware 2002: Computing in an Imperfect World: Springer, pp. 332-347, 2002.
[22]A. Agrawal and U. Gupta, "Extraction based approach for text summarization using k-means clustering," Int. J. Sci. Res. Publ.(IJSRP), vol. 4, no. 11, 2014.
[23]A. R. Deshpande and L. Lobo, "Text summarization using Clustering technique," International Journal of Engineering Trends and Technology, vol. 4, no. 8, 2013.
[24]M. N. Ingole, M. Bewoor, and S. Patil, "Text Summarization using Expectation Maximization Clustering Algorithm," International Journal of Engineering Research and Applications, vol. 2, no. 4, pp. 168-171, 2012.
[25]V. Hatzivassiloglou, et al., "Simfinder: A flexible clustering tool for summarization," Columbia University New York United States, 2001.
[26]K. Shivakumar and R. Soumya, "Text summarization using clustering technique and SVM technique," International Journal of Applied Engineering Research, vol. 10, no. 12, pp. 28873-28881, 2015.
[27]F. Kyoomarsi, H. Khosravi, E. Eslami, and M. Davoudi, "Extraction-based text summarization using fuzzy analysis," Iranian Journal of Fuzzy Systems, vol. 7, no. 3, pp. 15-32, 2010.
[28]P. D. Patil and N. Kulkarni, "Text summarization using fuzzy logic," International Journal of Innovative Research in Advanced Engineering (IJIRAE) Volume, vol. 1, 2014.
[29]A. Kiani and M. R. Akbarzadeh, "Automatic text summarization using hybrid fuzzy ga-gp," in Fuzzy Systems, 2006 IEEE International Conference, pp. 977-983, 2006.
[30]M. Moradi and N. Ghadiri, "A Bayesian Approach to Biomedical Text Summarization," CoRR, arXiv preprint arXiv:1605.02948, 2016.
[31]G. H. Lee and K. J. Lee, "Automatic Text Summarization Using Reinforcement Learning with Embedding Features," in Proceedings of the Eighth International Joint Conference on Natural Language Processing,vol. 2, pp. 193-197, 2017.
[32]T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[33]S. R. Rahimi, A. T. Mozhdehi, and M. Abdolahi, "An overview on extractive text summarization," in Knowledge-Based Engineering and Innovation (KBEI), 2017 IEEE 4th International Conference, pp. 0054-0062, 2017.
[34]M. Allahyari, et al., "Text summarization techniques: a brief survey," arXiv preprint arXiv:1707.02268, 2017.
[35]H. Lin and J. Bilmes, "A class of submodular functions for document summarization," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Vol. 1, pp. 510-520, 2011.
[36]J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
[37]M. Abdolahi and M. Zahedi, "Sentence matrix normalization using most likely n-grams vector," in Knowledge-Based Engineering and Innovation (KBEI), 2017 IEEE 4th International Conference, pp. 0040-0045, 2017.
[38]R. Barzilay and M. Lapata, "Modeling local coherence: An entity-based approach," Computational Linguistics, vol. 34, no. 1, pp. 1-34, 2008.
[39]B. M, Derhami and V. Zarifzadeh, “A Method for Automatic Key phrase Extraction from Persian Web News”, Tabriz Journal of Electrical Engineering, 47(3), 857-866, 2017, [in Persian]
[40]C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," Text Summarization Branches Out, pp. 74-81, 2004.