A Systematic Study of Data Wrangling

Full Text (PDF, 502KB), PP.32-39

Views: 0 Downloads: 0

Author(s)

Malini M. Patil 1,* Basavaraj N. Hiremath 2

1. Dept. of Information Science and Engineering. J.S.S Academy of Technical Education, Bengaluru, Karnataka

2. Dept. of Computer Science and Engineering, JSSATE Research Centre, J.S.S Academy of Technical Education, Bengaluru, Karnataka

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.01.04

Received: 26 Sep. 2017 / Revised: 24 Oct. 2017 / Accepted: 7 Nov. 2017 / Published: 8 Jan. 2018

Index Terms

Business Intelligence, wrangler, prescrip-tive analytics, data integration, predictive transformation

Abstract

The paper presents the theory, design, usage aspects of data wrangling process used in data ware housing and business intelligence. Data wrangling is defined as an art of data transformation or data preparation. It is a method adapted for basic data management which is to be properly processed, shaped, and is made available for most convenient consumption of data by the potential future users. A large historical data is either aggregated or stored as facts or dimensions in data warehouses to accommodate large adhoc queries. Data wrangling enables fast processing of business queries with right solutions to both analysts and end users. The wrangler provides interactive language and recommends predictive transformation scripts. This helps the user to have an insight of reduction of manual iterative processes. Decision support systems are the best examples here. The methodologies associated in preparing data for mining insights are highly influenced by the impact of big data concepts in the data source layer to self-service analytics and visualization tools.

Cite This Paper

Malini M. Patil, Basavaraj N. Hiremath, "A Systematic Study of Data Wrangling", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.1, pp.32-39, 2018. DOI:10.5815/ijitcs.2018.01.04

Reference

[1]Cline Don, Yueh Simon and Chapman Bruce, Stankov Boba, Al Gasiewski, and Masters Dallas, Elder Kelly, Richard Kelly, Painter Thomas H., Miller Steve, Katzberg Steve, Mahrt Larry, (2009), NASA Cold Land Processes Experiment (CLPX 2002/03): Airborne Remote Sensing.

[2]S. K. S and M. S. S, “A New Dynamic Data Cleaning Technique for Improving Incomplete Dataset Consistency,” Int. J. Inf. Technol. Comput. Sci., vol. 9, no. 9, pp. 60–68, 2017.

[3]A. Fatima, N. Nazir, and M. G. Khan, “Data Cleaning In Data Warehouse: A Survey of Data Pre-processing Tech-niques and Tools,” Int. J. Inf. Technol. Comput. Sci., vol. 9, no. 3, pp. 50–61, 2017.

[4]Stodder. David (2016), WP 219 - EN, TDWI Best practic-es report: Improving data preparation for business analytics Q3 2016. © 2016 by TDWI, a division of 1105 Media, Inc, [Accessed on: May 3rd, 2017].

[5]Richard Wray, “Internet data heads for 500bn gigabytes | Business the Guardian,” www.theguardian.com. [Online] Availa-ble:https://www.theguardian.com/business/2009/may/18/digital-content-expansion. [Accessed: 24-Oct-2017].

[6]Aslett Matt Research analyst Trifacta of 451 Research and Davis Will head of marketing, Trifacta, “Trifacta maintains data preparation” [July ,7, 2017], [Online] Available: https://451research.com [Accessed on: 01 August 2017].

[7]Kandel Sean, Paepcke Andreas, Hellersteiny Joseph and Heer Jeffrey (2011), Wrangler: Interactive Visual Specifi-cation of Data Transformation Scripts, ACM Human Fac-tors in Computing Systems (CHI) ACM 978-1-4503-0267-8/11/05.

[8]Chaudhuri. S and Dayal. U (1997), An overview of data warehousing and OLAP technology. In SIGMOD Record.

[9]S. Kandel et al., “Research directions in data wrangling: Visualizations and transformations for usable and credible data,” Inf. Vis., vol. 10, no. 4, pp. 271–288, 2011.

[10]Chen W, Kifer.M, and Warren D.S, (1993), “HiLog: A foundation for higher-order logic programming”. In Journal of Logic Programming, volume 15, pages 187-230.

[11]Raman Vijayshankar and Hellerstein Joseph M, frshankar, (2001) “Potter's Wheel: An Interactive Data Cleaning Sys-tem”, Proceedings of the 27th VLDB Conference.

[12]Norman D.A, (2013), Text book on “The Design of Eve-ryday Things, Basic Books”, [Accessed on:12 April 2017].

[13]Carey Lucy, “Self-Service Data Governance & Preparation on Hadoop”, www.jaxenter.com. [Online] (May 29, 2014) Available: https://jaxenter.com/trifacta-ceo-the-evolution-of-data-transformation-and-its-impact-on-the-bottom-line-107826.html [Accessed on01 April 2017].

[14]Google code online publication (n.d), www.code.google.com. [Online] Available: https://code.google.com/archive/p/google-refine https://github.com/OpenRefine/OpenRefine [Accessed on; 28 March 2017]. 

[15]Data wrangling platform (2017) publication, www.trifacta.com. [Online] Available: https://www.trifacta.com/products/architecture//, [Accessed on: 01 May 2017].

[16]Little Cinny, The Forrester Wave™: Data Preparation Tools, Q1 2017 “The Seven Providers That Matter Most and How They Stack Up”, [Accessed On:March 13, 2017]

[17]Parsons Mark A, Brodzik Mary J, Rutter Nick J.  (2004), “Data management for the Cold Land Processes Experi-ment: improving hydrological science HYDROLOGICAL PROCESSES” Hydrol. Process. 18, 3637-3653. 

[18]T. Furche, G. Gottlob, L. Libkin, G. Orsi, and N. W. Paton, “Data Wrangling for Big Data: Challenges and Op-portunities,” EDBT, pp. 473–478, 2016.

[19]Kandel Sean, Paepcke Andreas, Hellersteiny Joseph and Heer Jeffrey (2011), published Image on Papers tab, www.vis.stanford.edu. [Online] Available: http://vis.stanford.edu/papers/wrangler%20paper [Accessed on: 25 May 2017]. 

[20]Ahuja.S, Roth.M, Gangadharaiah R, Schwarz.P and Bas-tidas.R, (2016), “Using Machine Learning to Accelerate Data Wrangling”, IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016, Barcelona, Spain, pp. 343-349.doi:10.1109/ICDMW.2016.0055.

[21]“Data catalog” Insurance dataset, [Online]www.data.gov. Available: https://catalog.data.gov/dataset. [Accessed: 24-Oct-2017].

[22]Piringer Florian Endel Harald, florian. (2015), Data Wrangling: Making data useful again, IFAC-PapersOnLine 48-1 (2015) 111-112.

[23]V. kumar, Tan Pang Ning, Steinbach micheal, Introduction to Data Mining. Dorling Kindersley (India) Pvt. Ltd: Pearson education publisher, 2012.