Effective XML Compressor: XMill with LZMA Data Compression

Full Text (PDF, 782KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Suchit A. Sapate 1,*

1. Persistent Systems Ltd., Nagpur, 440022, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2019.04.01

Received: 6 Feb. 2019 / Revised: 21 Mar. 2019 / Accepted: 25 Apr. 2019 / Published: 8 Jul. 2019

Index Terms

XML, XMill, LZ77, LZMA, 7Zip, gZip.

Abstract

The XMill is an efficient XML compression tool which takes the advantage of awareness of XML. XMill compresses the data on the basis of three principles- separate the XML structure from the data, group related data and apply the semantic compressors. The XMill uses the gZip library to compress the XML string data for increasing the compression ratio. Here we have proposed a new method to increase the compression ratio of XMill tool. In this method we have added the 7Zip library to the XMill tool; 7Zip library uses the LZMA algorithm to compress the data. LZMA is an enhanced & improved version of LZ77 algorithm which is used in the gZip library. LZMA algorithm has following features over the LZ77 algorithm

•Uses up to 4GB dictionary length instead of 32KB for removing the duplicate data.
•Uses the look-a-head approach instead of greedy approach.

•Uses the optimal parsing, shorter code for recently repeated matches.
•Uses the context handling.
Due to the above features our proposed approach achieves the best compression ratio with a comparable compression speed.

Cite This Paper

Suchit A. Sapate," Effective XML Compressor: XMill with LZMA Data Compression", International Journal of Education and Management Engineering(IJEME), Vol.9, No.4, pp.1-10, 2019. DOI: 10.5815/ijeme.2019.04.01

Reference

[1]Vojtech Toman, “Compression of XML Data,” Charles University, Master’s Thesis at Department of Software Engineering, March 2003

[2]Mark Nottingham and David Orchard,"On XML Optimization," BEA Systems Position Paper, Binary Interchange of XML Workshop, 2003.

[3]Sherif Sakr, "XML compression techniques: A survey and comparison," Elsevier, Information and Software Technology, 75 (2009) 303–322, 2009.

[4]Wikimedia Foundation, Inc, “7-Zip,” 29 May 2014, http://en.wikipedia.org/wiki/7-Zip . 

[5]Smitha S. Nair, "XML Compression Techniques: A Survey," Department of Computer Science, University of Iowa, USA, https://people.ok.ubc.ca/rlawrenc/research/Students/SN_04_XMLCompress.pdf.

[6]Igor Pavlov, “7-Zip,” 2013, http://www.7-zip.org/.

[7]H. Liefke and D. Suciu, “XMill: An Efficient Compressor for XML Data,” Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 2000.

[8]Pankaj M. Tolani, Jayant R. Haritsa, “XGRIND: A query-friendly XML compressor,” in: ICDE ’02: Proceedings of the 18th International Conference on Data Engineering, IEEE Computer Society, Washington, DC, USA, 2002, p. 225.

[9]Markhor, "CodePlexProject Hosting for Open Source Software," Feb 6, 2012, version 70. https://sevenzipsharp.codeplex.com/  

[10]Wikimedia Foundation, Inc, “7Z,” 19 May 2014, http://en.wikipedia.org/wiki/7z. 

[11]Wikimedia Foundation, Inc, “LZ77 and LZ78,” 21 April 2014, http://en.wikipedia.org/ wiki/LZ77_and_LZ78.

[12]Wilfred Ng, Lam Wai Yeung and James Cheng, "Comparative Analysis of XML Compression Technologies," World Wide Web: Internet and Web Information Systems, 9, 5–33,Springer Science + Business Media, Inc. Manufactured in The Netherlands,DOI: 10.1007/s11280-005-1435-2, 2005.

[13]Wikimedia Foundation, Inc, “Lempel-Zip-Markov chain algorithm,” 5 June2014, http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm.

[14]E.Jebamalar Leavline and D.Asir Antony Gnana Singh, "Hardware Implementation of LZMA Data Compression Algorithm," International Journal of Applied Information Systems, Foundation of Computer Science FCS, Volume 5, Issue-4, March 2013.

[15]H. Liefke and D. Suciu, “An Extensible Compressor for XML Data,” Proc. of ACM SIGMOD Intl. Conf. on Management of Data, 2000. 

[16]Nandan Phadke, Omkar Bahirat, Tejaswi KONDURI,  and CHANDRAMA THORAT, "PARALLEL DATA COMPRESSION USING LZMA," International Journal of Advanced Computational Engineering and Networking, Volume-1, Issue-2, April-2013.

[17]XMill Compressor, http://sourceforge.net/projects/xmill.

[18]XMLDataset. http://www.cs.washington.edu/research/xmldatasets/www/repository.html