PTSLGA: A Provenance Tracking System for Linked Data Generating Application

Full Text (PDF, 259KB), PP.87-93

Views: 0 Downloads: 0

Author(s)

Kumar Sharma 1,* Ujjal Marjit 2 Utpal Biswas 1

1. Department of Computer Science & Engineering, University of Kalyani, Kalyani, West Bengal, India

2. Centre for Information Resource Management, University of Kalyani, Kalyani, West Bengal, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2015.04.10

Received: 6 Sep. 2014 / Revised: 3 Dec. 2014 / Accepted: 20 Jan. 2015 / Published: 8 Mar. 2015

Index Terms

Provenance, Semantic Web, Linked Data, LOD

Abstract

Tracking provenance of RDF resources is an important task in Linked Data generating applications. It takes on a central function in gathering information as well as workflow. Various Linked Data generating applications have evolved for converting legacy data to RDF resources. These data belong to bibliographic, geographic, government, publications, and cross-domains. However, most of them do not support tracking data and workflow provenance for individual RDF resources. In such cases, it is required for those applications to track, store and disseminate provenance information describing their source data and involved operations. In this article, we introduce an approach for tracking provenance of RDF resources. Provenance information is tracked during the conversion process and it is stored into the triple store. Thereafter, this information is disseminated using provenance URIs. The proposed framework has been analyzed using Harvard Library Bibliographic Datasets. The evaluation has been made on datasets through converting legacy data into RDF and Linked Data with provenance. The outcome has been quiet promising in the sense that it enables data publishers to generate relevant provenance information while taking less time and efforts.

Cite This Paper

Kumar Sharma, Ujjal Marjit, Utpal Biswas, "PTSLGA: A Provenance Tracking System for Linked Data Generating Application", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.4, pp.87-93, 2015. DOI:10.5815/ijitcs.2015.04.10

Reference

[1]C. Bizer, T. Heath, and T. Berners-Lee. “Linked data-the story so far”. International journal on semantic web and information systems, 2009; 5(3), 1-22. 

[2]A. Schultz, A. Matteini, R. Isele, P. N. Mendes, C. Bizer, and C. Becker. (2012). “LDIF - A Framework for Large-Scale Linked Data Integration”. In 21st International World Wide Web Conference (WWW 2012), Developers Track, Lyon, France.

[3]G. Ciobanu, & R. Horne. “A provenance tracking model for data updates.” arXiv preprint arXiv:1208.4634 (2012).

[4]K. Alexander, and M. Hausenblas. “Describing linked datasets - on the design and usage of VoID, the Vocabulary of Interlinked Datasets”. In Linked Data on the Web Workshop (LDOW 09), in conjunction with 18th International World Wide Web Conference (WWW). 2009.

[5]T. Omitola, L. Zuo, C. Gutteridge, I. C. Millard, H. Glaser, N. Gibbins, and N. Shadbolt. “Tracing the provenance of linked data using VoID”. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics, 2011. ACM, p. 17.

[6]T. Omitola, N. Gibbins, and N. Shadbolt. “Provenance in Linked Data Integration”, 2010.

[7]O. Hartig. “Provenance Information in the Web of Data”. In Linked Data on the Web Workshop (LDOW), 2009.

[8]J. Zhao, A. Miles, G. Klyne, and D. Shotton. “Linked data and provenance in biological data webs”. Briefings in bioinformatics, 2009; 10(2): 139-152.

[9]R. Rajabi, M. Kahani, and M. A. Sicilia. “Trustworthiness of linked data using pki”. In Proceedings of the World Wide Web Conference (WWW), 2012.

[10]O. Hartig, J. Zhao, and H. Mühleisen. “Automatic integration of metadata into the web of linked data”. In Proceedings of the Demo Session at the 2nd Workshop on Trust and Privacy on the Social and Semantic Web (SPOT) at ESWC, 2010.

[11]T. D. Nies, S. Coppens, D. V. Deursen, E. Mannens, and R. V. D. Walle. “Automatic discovery of high-level provenance using semantic similarity”. In Provenance and Annotation of Data and Processes, 2012. Springer Berlin Heidelberg, pp. 97-110.

[12]C. Böhm, J. Lorey, and F. Naumann. “Creating void descriptions for web-scale data”. Web Semantics: Science, Services and Agents on the World Wide Web, 2011; 9(3): 339-345.

[13]T. D. Nies, S. Magliacane, R. Verborgh, S. Coppens, P. Groth, E. Mannens, and R. V.D. Walle. “Git2PROV: Exposing Version Control System Content as W3C PROV”. In Posters & Demonstrations Track within the 12th International Semantic Web Conference, 2013. CEUR-WS, pp. 125-128.

[14]A. Stolpe, M. G. Skjæveland. “From Spreadsheets to 5-star Linked Data in the Cultural Heritage Domain: A Case Study of the Yellow List”. Norsk Informatikkonferanse, 2011, Issue 21-23, p13.

[15]F. Maali, R. Cyganiak, and V. Peristeras. “A publishing pipeline for linked government data”. In The Semantic Web: Research and Applications, (pp. 778-792). Springer Berlin Heidelberg, 2012.

[16]R. S. Xin, O. Hassanzadeh, C. Fritz, S. Sohrabi, and R. J. Miller. “Publishing bibliographic data on the Semantic Web using BibBase”. Semantic Web, 4(1), 15-22, 2013.

[17]K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. “Describing linked datasets with the void vocabulary”. W3C, March 2011.

[18]G. Klyne, P. Groth, L. Moreau, O. Hartig, Y. Simmhan, J. Myers et. al. PROV-AQ: provenance access and query.W3C Note, 2012.

[19]K. Sharma, U. Marjit, and U. Biswas. “Exposing MARC 21 Format for Bibliographic Data as Linked Data with Provenance”, Journal of Library Metadata, pp. 212-229, 2013 published with license by Taylor & Francis ISSN: 1938-6389.

[20]http://www.cerl.org/resources/provenance/marc. (Accessed on September 9, 2014).

[21]http://openmetadata.lib.harvard.edu/bibdata. (Accessed on September 9, 2014).

[22]http://sws.ifi.uio.no/gulliste/page/dataset. (Accessed on September 9, 2014).