A Trend Analysis of Machine Learning Research with Topic Models and Mann-Kendall Test

Full Text (PDF, 1225KB), PP.70-82

Views: 0 Downloads: 0

Author(s)

Deepak Sharma 1,2,* Bijendra Kumar 2 Satish Chand 2

1. Department of Computer Engineering, Netaji Subash Institute of Technology, Sector-3, Dwarka, New Delhi, 110078, India

2. School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi, 110067, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.02.08

Received: 6 Mar. 2018 / Revised: 18 Apr. 2018 / Accepted: 20 May 2018 / Published: 8 Feb. 2019

Index Terms

Latent Semantic Analysis, Latent Dirichlet Allocation, Coherence Model, Text Mining, Data Mining, Machine Learning, Trend Analysis

Abstract

This paper aims to systematically examine the literature of machine learning for the period of 1968~2017 to identify and analyze the research trends. A list of journals from well-established publishers ScienceDirect, Springer, JMLR, IEEE (approximately 23,365 journal articles) related to machine learning is used to prepare a content collection. To the best of our information, it is the first effort to comprehend the trend analysis in machine learning research with topic models: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and LDA with Coherent Model (LDA_CM). The LDA_CM topic model gives the highest topic coherence amongst all topic models under consideration. This study provides a scientific ground that helps to overcome the subjectivity of collective opinion. The Mann-Kendall test is used to understand the trend of the topics. Our findings provide indicative of paradigmatic shifts in research methodology of significant patterns of topical prominence and the evolving research areas. It is used to highlight the evolution regarding the previous and recent trends in research topics in the area of machine learning. Understanding such an intellectual structure and future trends will assist the researchers to adopt the divergent developments of this research in one place. This paper analyzes the overall trends of the machine learning research since 1968, based on the latent topics identified in the period of 2007~2017 that may be helpful to the researchers exploring the recommended areas and publish their research articles.

Cite This Paper

Deepak Sharma, Bijendra Kumar, Satish Chand, "A Trend Analysis of Machine Learning Research with Topic Models and Mann-Kendall Test", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.2, pp.70-82, 2019. DOI:10.5815/ijisa.2019.02.08

Reference

[1]Carbonell, Jaime G., Ryszard S. Michalski, and Tom M. Mitchell, “Machine learning: a historical and methodological analysis,” AI Magazine 4.3 (1983): 69.
[2]Marr, Bernard, “A Short History of Machine Learning—Every Manager Should Read,” Forbes. http://tinyurl. com/gslvr6k (2016).
[3]Domingos, Pedro, “A few useful things to know about machine learning,” Communications of the ACM 55.10 (2012): 78-87.
[4]Yalcinkaya, Mehmet, and Vishal Singh, “Patterns and trends in building information modeling (BIM) research: a latent semantic analysis,” Automation in Construction 59 (2015): 68-80.
[5]Campbell, Joshua Charles, Abram Hindle, and Eleni Stroulia, “Latent Dirichlet allocation: extracting topics from software engineering data,” The art and science of analyzing software data. 2016. 139-159.
[6]Canini, Kevin, Lei Shi, and Thomas Griffiths, “Online inference of topics with latent Dirichlet allocation,” Artificial Intelligence and Statistics. 2009.
[7]Saini, Shubham, Bhavesh Kasliwal, and Shraey Bhatia, “Language identification using g-lda,” International Journal of Research in Engineering and Technology (2013).
[8]Daud, Ali, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. "Knowledge discovery through directed probabilistic topic models: a survey." Frontiers of computer science in China 4, no. 2 (2010): 280-301.
[9]Blei, David M. "Probabilistic topic models." Communications of the ACM 55, no. 4 (2012): 77-84.
[10]Steyvers, Mark, and Tom Griffiths. "Probabilistic topic models." Handbook of latent semantic analysis 427, no. 7 (2007): 424-440.
[11]Jelisavcic, V., Furlan, B., Protic, J., & Milutinovic, V. M., “Topic Models and Advanced Algorithms for Profiling of Knowledge in Scientific Papers”, 35th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’2012), 1030–1035.
[12]Sharma, Deepak, Bijendra Kumar, and Satish Chand. "A Survey on Journey of Topic Modeling Techniques from SVD to Deep Learning." International Journal of Modern Education and Computer Science 9.7 (2017): 50.
[13]Evangelopoulos, Nicholas, Xiaoni Zhang, and Victor R. Prybutok, “Latent semantic analysis: five methodological recommendations,” European Journal of Information Systems21.1 (2012): 70-86.
[14]Deerwester, Scott, et al., “Indexing by latent semantic analysis,” Journal of the American society for information science 41.6 (1990): 391.
[15]Blei, David M., Andrew Y. Ng, and Michael I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research3.Jan (2003): 993-1022.
[16]Kundu, Anirban, et al., “A journey from normative to behavioral operations in supply chain management: A review using Latent Semantic Analysis,” Expert Systems with Applications42.2 (2015): 796-809.
[17]Sehra, Sumeet Kaur, et al., “Research patterns and trends in software effort estimation,” Information and Software Technology 91 (2017): 1-21.
[18]Taghandiki, Kazem, Ahmad Zaeri, and Amirreza Shirani. "A Supervised Approach for Automatic Web Documents Topic Extraction Using Well-Known Web Design Features." International Journal of Modern Education and Computer Science 8.11 (2016): 20.
[19]Santosh, D. Teja, et al. "Opinion mining of online product reviews from traditional LDA Topic Clusters using Feature Ontology Tree and Sentiwordnet." IJEME 6 (2016): 1-11.
[20]Mondal, Arun, Sananda Kundu, and Anirban Mukhopadhyay, “Rainfall trend analysis by Mann-Kendall test: A case study of north-eastern part of Cuttack district, Orissa,” International Journal of Geology, Earth and Environmental Sciences 2.1 (2012): 70-78.
[21]Bird, Steven, and Edward Loper, “NLTK: the natural language toolkit,” Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Association for Computational Linguistics, 2004.
[22]Porter, Martin F, “An algorithm for suffix stripping,” Program14.3 (1980): 130-137.
[23]Cios, Krzysztof J., et al., “Data mining: a knowledge discovery approach,” Springer Science & Business Media, 2007.
[24]Dumais, Susan T., “Latent semantic analysis,” Annual review of information science and technology 38.1 (2004): 188-230.
[25]Dumais, Susan T, “LSA and information retrieval: Getting back to basics,” Handbook of latent semantic analysis 293 (2007): 322.
[26]Han, Jiawei, Jian Pei, and Micheline Kamber, “Data mining: concepts and techniques,” Elsevier, 2011.
[27]Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze, “Introduction to information retrieval,” Vol. 1. No. 1. Cambridge: Cambridge university press, 2008.
[28]Landauer, Thomas K, “LSA as a theory of meaning,” Handbook of latent semantic analysis 6 (2007): 3-34.
[29]Martin, Dian I., and Michael W. Berry, “Mathematical foundations behind latent semantic analysis,” Handbook of latent semantic analysis (2007): 35-56.
[30]Valle-Lisboa, Juan C., and Eduardo Mizraji, “The uncovering of hidden structures by latent semantic analysis,” Information sciences 177.19 (2007): 4122-4147.
[31]Steyvers, Mark, and Tom Griffiths, “Probabilistic topic models,” Handbook of latent semantic analysis 427.7 (2007): 424-440.
[32]Mavridis, Themistoklis, and Andreas L. Symeonidis, “Semantic analysis of web documents for the generation of optimal content,” Engineering Applications of Artificial Intelligence 35 (2014): 114-130.
[33]Alghamdi, Rubayyi, and Khalid Alfalqi, “A survey of topic modeling in text mining,” Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6.1 (2015).
[34]Röder, Michael, Andreas Both, and Alexander Hinneburg, “Exploring the space of topic coherence measures,” Proceedings of the eighth ACM international conference on Web search and data mining. ACM, 2015.
[35]Mann, Henry B, “Nonparametric tests against trend,” Econometrica: Journal of the Econometric Society (1945): 245-259.
[36]Mg, Kendall, “Rank correlation methods,” London: Charles Griffin 35 (1975).
[37]Hisdal, Hege, et al., “Have streamflow droughts in Europe become more severe or frequent?,” International Journal of Climatology 21.3 (2001): 317-333.
[38]Wu, Hong, et al., “Trend analysis of streamflow drought events in Nebraska,” Water Resources Management 22.2 (2008): 145-164.
[39]Zhao, Weizhong, et al., “A heuristic approach to determine an appropriate number of topics in topic modeling,” BMC bioinformatics. Vol. 16. No. 13. BioMed Central, 2015.
[40]Bradford, Roger B., “An empirical study of required dimensionality for large-scale latent semantic indexing applications,” Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008.
[41]Sidorova, Anna, et al., “Uncovering the intellectual core of the information systems discipline,” Mis Quarterly (2008): 467-482.
[42]Salton, Gerard, Anita Wong, and Chung-Shu Yang, “A vector space model for automatic indexing,” Communications of the ACM 18.11 (1975): 613-620.
[43]Rehurek, Radim, and Petr Sojka, “Software framework for topic modeling with large corpora,” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010.
[44]Delp, P., et al., “Delphi: System tools for project planning,” Columbus, OH: National Center for Research in Vocational Education, Ohio State University (1977): 45-56.