Work place: Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine
Research Interests: Computer systems and computational processes, Computer Vision, Pattern Recognition, Computer Architecture and Organization, Data Mining
Yuriy Ushenko was born on December 23, 1980 in Chishinau, Moldova. M.Sc. in Telecommunications (2003). PhD in Optics and Laser Physics (2006). D.Sc. in Optics and Laser Physics, Taras Shevchenko National University of Kyiv (2015). Current position – Professor, Head of Computer Science Department, Chernivtsi National University, Ukraine. Research Interests: Data Mining and Analysis, Computer Vision and Pattern Recognition, Optics & Photonics, Biophysics.
DOI: https://doi.org/10.5815/ijmecs.2024.01.03, Pub. Date: 8 Feb. 2024
The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.[...] Read more.
DOI: https://doi.org/10.5815/ijigsp.2023.06.04, Pub. Date: 8 Dec. 2023
A new local-topological approach to describe the spatial and angular distributions of polarization parameters of multiply scattered optically anisotropic biological layers of laser fields is considered. A new analytical parameter to describe the local polarization structure of a set of points of coherent object fields, the degree of local depolarization (DLD), is introduced for the first time. The experimental scheme and the technique of measuring coordinate distributions (maps) of DLD The new method of local polarimetry was experimentally tested on histological specimens of biopsy sections of operatively extracted breast tumors. The measured DLD maps were processed using statistical, autocorrelation and scale-sampling approaches. Markers for differential diagnosis of benign (fibroadenoma) and malignant (sarcoma) breast tumors were defined.[...] Read more.
DOI: https://doi.org/10.5815/ijmecs.2023.06.03, Pub. Date: 8 Dec. 2023
The software for clustering students according to their educational achievements using fuzzy logic was developed in Python using the Google Colab cloud service. In the process of analyzing educational data, the problems of Data Mining are solved, since only some characteristics of the educational process are obtained from a large sample of data. Data clustering was performed using the classic K-Means method, which is characterized by simplicity and high speed. Cluster analysis was performed in the space of two features using the machine learning library scikit-learn (Python). The obtained clusters are described by fuzzy triangular membership functions, which allowed to correctly determine the membership of each student to a certain cluster. Creation of fuzzy membership functions is done using the scikit-fuzzy library. The development of fuzzy functions of objects belonging to clusters is also useful for educational purposes, as it allows a better understanding of the principles of using fuzzy logic. As a result of processing test educational data using the developed software, correct results were obtained. It is shown that the use of fuzzy membership functions makes it possible to correctly determine the belonging of students to certain clusters, even if such clusters are not clearly separated. Due to this, it is possible to more accurately determine the recommended level of difficulty of tasks for each student, depending on his previous evaluations.[...] Read more.
DOI: https://doi.org/10.5815/ijigsp.2023.05.06, Pub. Date: 8 Oct. 2023
At the current moment, all developed polarization methods utilize "single-point" statistical analysis algorithms for laser fields. A relevant task is to generalize traditional techniques by incorporating new correlation-based "two-point" algorithms for the analysis of polarization images. Theoretical foundations of the mutual and autocorrelation processing of phase maps of polarization-structural images of samples of dehydrated serum films are given. The maps of a new polarization-correlation parameters, namely complex degree of coherence (CDC) and complex degree of mutual polarization (CDMP) of soft matter layer boundary field by the example of dehydrated serum film samples are investigated. Two groups of representative samples, uterine myoma patients (control group 1) and patients with external genital endometriosis (study group 2), were considered. We applied a complex algorithm of analytical data processing - statistical (1stand 4th central statistical moments), correlation (Gram-Charlie expansion coefficients of autocorrelation functions) and fractal (fractal dimensions) parameters of polarization-correlation parameters maps. Objective markers for diagnosing extragenital endometriosis were found.[...] Read more.
DOI: https://doi.org/10.5815/ijmecs.2023.04.06, Pub. Date: 8 Aug. 2023
A generalized model of population migration is proposed. On its basis, models of the set of directions of population flows, the duration of migration, which is determined by its nature in time, type and form of migration, are developed. The model of indicators of actual migration (resettlement) is developed and their groups are divided. The results of population migration are described, characterized by a number of absolute and relative indicators for the purpose of regression analysis of data. To obtain the results of migration, the author takes into account the power of migration flows, which depend on the population of the territories between which the exchange takes place and on their location on the basis of the coefficients of the effectiveness of migration ties and the intensity of migration ties. The types of migration intensity coefficients depending on the properties are formed. The lightgbm algorithm for predicting population migration is implemented in the intelligent geographic information system. The migration forecasting system is also capable of predicting international migration or migration between different countries. The significance of conducting this survey lies in the increasing need for accurate and reliable migration forecasts. With globalization and the connectivity of nations, understanding and predicting migration patterns have become crucial for various domains, including social planning, resource allocation, and economic development. Through extensive experimentation and evaluation, developed migration forecasting system has demonstrated results of human migration based on machine learning algorithms. Performance metrics of migration flow forecasting models are investigated, which made it possible to present the results obtained from the evaluation of these models using various performance indicators, including the mean square error (MSE), root mean square error (RMSE) and R-squared (R2). The MSE and RMSE measure the root mean square difference between predicted and actual values, while the R2 represents the proportion of variance explained by the model.[...] Read more.
DOI: https://doi.org/10.5815/ijmecs.2023.03.06, Pub. Date: 8 Jun. 2023
The article develops a technology for finding tweet trends based on clustering, which forms a data stream in the form of short representations of clusters and their popularity for further research of public opinion. The accuracy of their result is affected by the natural language feature of the information flow of tweets. An effective approach to tweet collection, filtering, cleaning and pre-processing based on a comparative analysis of Bag of Words, TF-IDF and BERT algorithms is described. The impact of stemming and lemmatization on the quality of the obtained clusters was determined. Stemming and lemmatization allow for significant reduction of the input vocabulary of Ukrainian words by 40.21% and 32.52% respectively. And optimal combinations of clustering methods (K-Means, Agglomerative Hierarchical Clustering and HDBSCAN) and vectorization of tweets were found based on the analysis of 27 clustering of one data sample. The method of presenting clusters of tweets in a short format is selected. Algorithms using the Levenstein Distance, i.e. fuzz sort, fuzz set and Levenshtein, showed the best results. These algorithms quickly perform checks, have a greater difference in similarities, so it is possible to more accurately determine the limit of similarity. According to the results of the clustering, the optimal solutions are to use the HDBSCAN clustering algorithm and the BERT vectorization algorithm to achieve the most accurate results, and to use K-Means together with TF-IDF to achieve the best speed with the optimal result. Stemming can be used to reduce execution time. In this study, the optimal options for comparing cluster fingerprints among the following similarity search methods were experimentally found: Fuzz Sort, Fuzz Set, Levenshtein, Jaro Winkler, Jaccard, Sorensen, Cosine, Sift4. In some algorithms, the average fingerprint similarity reaches above 70%. Three effective tools were found to compare their similarity, as they show a sufficient difference between comparisons of similar and different clusters (> 20%).
The experimental testing was conducted based on the analysis of 90,000 tweets over 7 days for 5 different weekly topics: President Volodymyr Zelenskyi, Leopard tanks, Boris Johnson, Europe, and the bright memory of the deceased. The research was carried out using a combination of K-Means and TF-IDF methods, Agglomerative Hierarchical Clustering and TF-IDF, HDBSCAN and BERT for clustering and vectorization processes. Additionally, fuzz sort was implemented for comparing cluster fingerprints with a similarity threshold of 55%. For comparing fingerprints, the most optimal methods were fuzz sort, fuzz set, and Levenshtein. In terms of execution speed, the best result was achieved with the Levenshtein method. The other two methods performed three times worse in terms of speed, but they are nearly 13 times faster than Sift4. The fastest method is Jaro Winkler, but it has a 19.51% difference in similarities. The method with the best difference in similarities is fuzz set (60.29%). Fuzz sort (32.28%) and Levenshtein (28.43%) took the second and third place respectively. These methods utilize the Levenshtein distance in their work, indicating that such an approach works well for comparing sets of keywords. Other algorithms fail to show significant differences between different fingerprints, suggesting that they are not adapted to this type of task.
DOI: https://doi.org/10.5815/ijmecs.2023.02.06, Pub. Date: 8 Apr. 2023
A method of choosing swarm optimization algorithms and using swarm intelligence for solving a certain class of optimization tasks in industry-specific geographic information systems was developed considering the stationarity characteristic of such systems. The method consists of 8 stages. Classes of swarm algorithms were studied. It is shown which classes of swarm algorithms should be used depending on the stationarity, quasi-stationarity or dynamics of the task solved by an industry geographic information system. An information model of geodata that consists in a formalized combination of their spatial and attributive components, which allows considering the relational, semantic and frame models of knowledge representation of the attributive component, was developed. A method of choosing optimization methods designed to work as part of a decision support system within an industry-specific geographic information system was developed. It includes conceptual information modeling, optimization criteria selection, and objective function analysis and modeling. This method allows choosing the most suitable swarm optimization method (or a set of methods).[...] Read more.
Subscribe to receive issue release notifications and newsletters from MECS Press journals