Nait Bahloul Safia

Work place: LITIO laboratory, University of Oran, BP 1524, El-M'Naouer, 31000, Oran, Algeria

E-mail: nait-bahloul.safia@univ-oran.dz

Website:

Research Interests: World Wide Web

Biography

Nait Bahloul Safia:  Associate professor in department of computer science, University of Oran Es Senia, Algeria. Since 2011, she has been leading a team on the topic of data engineering and Web Technology. Her research covers advanced aspects of databases, Web technology and unsupervised classification.

Author Articles
An Approach for Indexing Web Data Sources

By Saidi Imene Nait Bahloul Safia

DOI: https://doi.org/10.5815/ijitcs.2014.09.07, Pub. Date: 8 Aug. 2014

Web information sources such as forums, blogs, and news articles are becoming increasingly large and diverse. Even if advances in technology are helping to improve techniques for dealing with the large amounts of the generated data, such data sources are heterogeneous in structure (semi structured or unstructured sources) and nature (texts or images). Implementation of software solutions is then necessary to prepare data and access these sources in a homogenous way. In this paper we present an approach for indexing heterogeneous data sources. Our objective is to offer techniques for efficient indexing of web sources by storing only the necessary information. We propose automatic indexing for semi structured or unstructured sources (e.g., xml files, html files) and annotation for other sources (e.g., images, videos that exist within a page). We present our algorithms of indexing and propose the use of MapReduce model to build a scalable inverted index. Experiments on a real-world corpus show that our approach achieves a good performance.

[...] Read more.
Other Articles