One of the tools in our research on the expressive power of (fragments of) Tarski's relation algebra is a our *bruteforce* tool. This tool can compute every possible query result one can obtain via queries expressed in a given query language and evaluated on a given graph. Using these computations, we can determine results on the expressive power for Boolean queries and path queries.

The first such tool was developed by Jan Van den Bussche, and was used to help with obtaining the results in

- ICDT 2011.
Relative expressive power of navigational querying on graphs

. . (2011). In: Proceedings of the 14th International Conference on Database Theory, 197-207. ACM. DOI: 10.1145/1938551.1938578. - IS 2015.
Relative expressive power of navigational querying on graphs

. . (2015). In: Information Sciences, 298, 390-406. Elsevier. DOI: 10.1016/j.ins.2014.11.031. - LJIGPL 2015.
Relative expressive power of navigational querying on graphs using transitive closure

. . (2015). In: Logic Journal of the IGPL, 23(5), 759-788. Oxford University Press. DOI: 10.1093/jigpal/jzv028.

The version I used for my PhD thesis and for several papers started as a reimplementation from scratch with the aim of making a high-performance *bruteforce* tool while, at the same time, playing around with the new features in C++11 (and later C++14).

I have developed two versions of the *bruteforce* tool, which are available for public use under the BSD-license (license details included in the source files).

- The
*clean bruteforce*tool. source code (.zip).A full implementation of a clean well-performing*bruteforce*tool which was used during research. - The
*third bruteforce*tool. source code (.zip).This versions uses MMX/SSE/AVX instructions to implement part of the bruteforce routines as high-performance Boolean matrix multiplication operations. This tool has not been used for research purposes, but was an interesting prototype. The Boolean matrix multiplication version outperforms the traditional relational version by an order of magnitude.

There is a set of notes that describe the basic working of the bruteforce technique used in these *bruteforce* tools and how these techniques can be used to prove results on the expressive power of query languages. The note also describes some of the implementation details in the above two *bruteforce* tools. These notes where originally planned to be part of another new version of the *bruteforce* tool that improved on the tools above. Consequently, not all details in the note match the implementations above, but especially the part on Boolean matrix multiplication techniques can prove to be useful.

The bruteforce results in my thesis can be obtained by feeding the *clean bruteforce* tool the following script.

- JLAMP 2022
*The power of Tarski's relation algebra on trees*. . (2022). In: Journal of Logical and Algebraic Methods in Programming, 126, Elsevier. DOI: 10.1016/j.jlamp.2022.100748. See also FoIKS 2018a, PhD thesis. author copy.## Abstract

Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. In this work, we perform such a systematic study. Our approach is to start from a basic fragment which only allows composition and union. We then study how the expressive power of the query language changes if we add diversity, converse, projections, coprojections, intersection, and/or difference, both for path queries and Boolean queries. For path queries on labeled trees, we found that adding intersection and difference yields more expressive power for some fragments, while adding one of the other operators always yields more expressive power. For Boolean queries on labeled trees, we obtain a similar picture for the relative expressive power, except for a few fragments where adding converse or projection yields no more expressive power. Additionally, we also studied querying unlabeled trees, for which we have found several redundancies. One challenging problem remains open, however, for both path and Boolean queries: does adding difference yield more expressive power to fragments containing at least diversity, coprojections, and intersection?

- CJ 2020
*From relation algebra to semi-join algebra: An approach to graph query optimization*. . (2020). In: The Computer Journal, 64(5), 789-811, Oxford University Press. DOI: 10.1093/comjnl/bxaa031. See also DBPL 2017, PhD thesis. author copy.## Abstract

Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries using semi-joins instead, resulting in a significant reduction of the query evaluation cost. We study techniques to recognize and apply such rewritings. Concretely, we study the relationship between the expressive power of the relation algebras, which heavily rely on composition, and the semi-join algebras, which replace composition in favor of semi-joins. Our main result is that each fragment of the relation algebras where intersection and/or difference is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries evaluating to sets of nodes. For practical relevance, we exhibit constructive rules for rewriting relation algebra queries to semi-join algebra queries, and prove that they lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries. In addition, on sibling-ordered trees, we establish new relationships among the expressive power of

*Regular XPath*,*Conditional XPath*, FO-logic, and the semi-join algebra augmented with restricted fixpoint operators. - IS 2020
*Comparing the expressiveness of downward fragments of the relation algebra with transitive closure on trees*. . (2020). In: Information Systems, 89, Elsevier. DOI: 10.1016/j.is.2019.101467. See also DBPL 2015, PhD thesis. author copy, technical report.## Abstract

Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, projections, coprojections, intersection, and difference; this for boolean queries and path queries on labeled and unlabeled structures. In all cases, we present the complete Hasse diagram. In particular, we establish, for each query language fragment that we study on trees, whether it is closed under difference and intersection.

- FoIKS 2018a
*The power of tarski's relation algebra on trees*. . (2018). In: Foundations of Information and Knowledge Systems, 244-264, Springer. DOI: 10.1007/978-3-319-90050-6_14. See also JLAMP 2022, PhD thesis. author copy, slides.## Abstract

Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries,

*XPath*, and*SPARQL*. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. Our approach is to start from a basic fragment which only allows composition and union. We then study how the expressive power of the query language changes if we add diversity, converse, projections, coprojections, intersections, and/or difference, both for path queries and Boolean queries. For path queries, we found that adding intersection and difference yields more expressive power for some fragments, while adding one of the other operators always yields more expressive power. For Boolean queries, we obtain a similar picture for the relative expressive power, except for a few fragments where adding converse or projection yields no more expressive power. One challenging problem remains open, however, for both path and Boolean queries: does adding difference yields more expressive power to fragments containing at least diversity, coprojections, and intersection? - DBPL 2017
*From relation algebra to semi-join algebra: An approach for graph query optimization*. . (2017). In: Proceedings of The 16th International Symposium on Database Programming Languages, 5:1-5:10, ACM. DOI: 10.1145/3122831.3122833. See also CJ 2020, PhD thesis. author copy, slides.## Abstract

Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be significantly reduced.

We study techniques to recognize and apply such rewritings. Concretely, we study the relationship between the expressive power of the relation algebras, that heavily rely on composition, and the semi-join algebras, that replace the composition operator in favor of the semi-join operators.

As our main result, we show that each fragment of the relation algebras where intersection and/or difference is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries that evaluate to sets of nodes. For practical relevance, we exhibit constructive steps for rewriting relation algebra queries to semi-join algebra queries, and prove that these steps lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries.

In addition, on node-labeled graphs that are sibling-ordered trees, we establish new relationships among the expressive power of

*Regular XPath*,*Conditional XPath*, FO-logic, and the semi-join algebra augmented with restricted fixpoint operators. - DBPL 2015
*Relative expressive power of downward fragments of navigational query languages on trees and chains*. . (2015). In: Proceedings of the 15th Symposium on Database Programming Languages, 59-68, ACM. DOI: 10.1145/2815072.2815081. See also IS 2020, PhD thesis. author copy, slides.## Abstract

Motivated by the continuing interest in the tree data model, we study the expressive power of downward fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We study the effects on the expressive power when we add transitive closure, projections, coprojections, intersection, and difference. We study expressiveness at the level of boolean queries and path queries, on labeled and unlabeled trees, and on labeled and unlabeled chains. In all these cases, we are able to present the complete Hasse diagram of relative expressiveness. In particular, we were able to decide, for each fragment of the navigational query languages that we study, whether it is closed under difference and intersection when applied on trees.

- PhD thesis
*On tarski's relation algebra: Querying trees and chains and the semi-join algebra*. . (2018). Hasselt University and transnational University of Limburg. Adviser: Marc Gyssens. See also CJ 2020, IS 2020, FoIKS 2018a, DBPL 2017, DBDBD 2016, DBPL 2015. author copy, thesis, slides.## Abstract

Many practical query languages for graph data are based on fragments of Tarski's relation algebra which, optionally, is augmented with the Kleene-star operator. Examples include

*XPath*,*SPARQL*, the*RPQs*, and*GXPath*. Because of this central role of (fragments of) the relation algebra, we study two aspects in more detail. Combined, these two studies give a detailed picture of the expressive power of the fragments of the relation algebra. Moreover, our results provide several opportunities for the development of new techniques for the efficient evaluation of graph queries.