Querying Graphs with Data
- 20 March 2016
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 63 (2), 1-53
- https://doi.org/10.1145/2850413
Abstract
Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it. In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.Keywords
Funding Information
- DFG (MA 4938/2-1)
- Millennium Nucleus Center for Semantic Web Research (NC120004)
- EPSRC (G049165, J015377 and M025268)
This publication has 30 references indexed in Scilit:
- XML with incomplete informationJournal of the ACM, 2010
- Automata for Data Words and Data TreesElectronic Proceedings in Theoretical Computer Science, 2010
- Two-variable logic on data trees and XML reasoningJournal of the ACM, 2009
- LTL with the freeze quantifier and register automataACM Transactions on Computational Logic, 2009
- Survey of graph database modelsACM Computing Surveys, 2008
- Representing and querying XML with incomplete informationACM Transactions on Database Systems, 2006
- Efficient algorithms for processing XPath queriesACM Transactions on Database Systems, 2005
- Finite state machines for strings over infinite alphabetsACM Transactions on Computational Logic, 2004
- Reachability logic: an efficient fragment of transitive closure logicLogic Journal of the IGPL, 2000
- A linear-time model-checking algorithm for the alternation-free modal mu-calculusFormal Methods in System Design, 1993