Suha Ibrahim Alsharekh

Work place: Master Students, Department of Computer Science, College of Computer, Qassim University, KSA

E-mail: soha.e.sh@gmail.com

Website:

Research Interests: Applied computer science, Computer systems and computational processes, Theoretical Computer Science

Biography

Suha Ibrahim Abdullah Alsharekh: received her B.Sc in Computer Science, from the College of Computer, Qassim University, Saudi Arabia in 2015. Currently, she is studying MSc in Computer Science, Qassim University.

Author Articles
A hybrid Technique for Cleaning Missing and Misspelling Arabic Data in Data Warehouse

By Mohammed Abdullah Al-Hagery Latifah Abdullah Alreshoodi Maram Abdullah Almutairi Suha Ibrahim Alsharekh Emtenan Saad Alkhowaiter

DOI: https://doi.org/10.5815/ijitcs.2019.07.03, Pub. Date: 8 Jul. 2019

Real-World datasets accumulated over a number of years tend to be incomplete, inconsistent and contain noisy data, this, in turn, will cause an inconsistency of data warehouses. Data owners are having hundred-millions to billions of records written in different languages, hence continuously increases the need for comprehensive, efficient techniques to maintain data consistency and increase its quality. It is known that the data cleaning is a very complex and difficult task, especially for the data written in Arabic as a complex language, where various types of unclean data can occur to the contents. For example, missing values, dummy values, redundant, inconsistent values, misspelling, and noisy data. The ultimate goal of this paper is to improve the data quality by cleaning the contents of Arabic datasets from various types of errors, to produce data for better analysis and highly accurate results. This, in turn, leads to discover correct patterns of knowledge and get an accurate Decision-Making. This approach established based on the merging of different algorithms. It ensures that reliable methods are used for data cleansing. This approach cleans the Arabic datasets based on the multi-level cleaning using Arabic Misspelling Detection, Correction Model (AMDCM), and Decision Tree Induction (DTI). This approach can solve the problems of Arabic language misspelling, cryptic values, dummy values, and unification of naming styles. A sample of data before and after cleaning errors presented.

[...] Read more.
Other Articles