Anosh Fatima

Work place: National University of Computer and Emerging Sciences, Faisalabad, 38000, Pakistan

E-mail: f159005@nu.edu.pk

Website:

Research Interests: Artificial Intelligence, Data Mining, Data Structures and Algorithms

Biography

Anosh Fatima was born in Faisalabad, Punjab, Pakistan in 1993. She received the BS degree in Computer Science from the National University of Computer and Emerging Sciences (FAST-NU), Lahore, Pakistan, in 2015. She is currently student of MS degree in Computer Science from the National University of Computer and Emerging Sciences (FAST-NU), Faisalabad, Pakistan. The MS Degree will be completed in 2017.

In 2012, she joined internship program at National University of Computer and Emerging Sciences (FAST-NU), Faisalabad Campus, as a Front Desk Officer and Human Resource Manager. In 2013, she joined summer program at Career Institute, Faisalabad, as Course Instructor. In 2014, she joined National University of Computer and Emerging Sciences (FAST-NU), Lahore, as Teacher Assistant for Human Computer Interaction course. In 2015, she joined National University of Computer and Emerging Sciences (FAST-NU), Faisalabad, as Teacher assistant for Human Computer Interaction, Computer Architecture and Artificial Intelligence courses. She is currently working as Visiting Lecturer at Government College University Faisalabad, Pakistan. Her main areas of research interests are Data Mining, Data Warehousing, Data Science & Artificial Intelligence.

Author Articles
Data Cleaning In Data Warehouse: A Survey of Data Pre-processing Techniques and Tools

By Anosh Fatima Nosheen Nazir Muhammad Gufran Khan

DOI: https://doi.org/10.5815/ijitcs.2017.03.06, Pub. Date: 8 Mar. 2017

A Data Warehouse is a computer system designed for storing and analyzing an organization's historical data from day-to-day operations in Online Transaction Processing System (OLTP). Usually, an organization summarizes and copies information from its operational systems to the data warehouse on a regular schedule and management performs complex queries and analysis on the information without slowing down the operational systems. Data need to be pre-processed to improve quality of data, before storing into data warehouse. This survey paper presents data cleaning problems and the approaches in use currently for pre-processing. To determine which technique of pre-processing is best in what scenario to improve the performance of Data Warehouse is main goal of this paper. Many techniques have been analyzed for data cleansing, using certain evaluation attributes and tested on different kind of data sets. Data quality tools such as YALE, ALTERYX, and WEKA have been used for conclusive results to ready the data in data warehouse and ensure that only cleaned data populates the warehouse, thus enhancing usability of the warehouse. Results of paper can be useful in many future activities like cleansing, standardizing, correction, matching and transformation. This research can help in data auditing and pattern detection in the data.

[...] Read more.
Other Articles