Fast Execution of RDF Queries using Apache Hadoop

Somnath Mazumdar, Alberto Scionti

Publikation: Bidrag til bog/antologi/rapportBidrag til bog/antologiForskningpeer review

Abstract

Map-Reduce (MR) is a distributed programming framework which became very popular since its introduction, due to its ability to process massive data sets. MR provides a robust and straightforward mechanism to implement distributed applications without worrying much about many management aspects of parallel programming (e.g., instantiating jobs, data distribution, job synchronization). On the other hand, the Resource Description Framework (RDF) with its simplicity and flexibility, can represent semistructured and unstructured data which are very important for representing web-semantics. SPARQL is a query language aimed at retrieving and manipulating data stored in RDF format and also supports “Big Data” applications. In this book chapter, we present a framework designed to evaluate complex SPARQL queries fast. To improve the execution of SPARQL queries, we implemented the query engine on the Hadoop framework. The engine can handle large and complex queries involving multiple join variables while running on large RDF data sets. Further execution speedup is obtained by preprocessing the input data with parallel Bloom filters. The query engine has been tested on the SP2 benchmark, and the results demonstrate the benefits of the design. In this case, the minimum query improvement is 5% while the maximum improvement has been achieved is 82%.
OriginalsprogEngelsk
TitelAdvances in Computers
RedaktørerAli R. Hurson
Antal sider33
Vol/bind119
UdgivelsesstedCambridge, MA
ForlagAcademic Press
Publikationsdato2020
Sider1-33
Kapitel1
ISBN (Trykt)9780128203255
ISBN (Elektronisk)9780128203262
DOI
StatusUdgivet - 2020
Udgivet eksterntJa

Emneord

  • Bloom filter
  • Hadoop½
  • RDF
  • SPARQL
  • Query processing

Citationsformater