Bisimulation is an equivalence relating 'equivalent behaving' nodes. Bisimulation partitioning uses this equivalence relation to group these 'equivalent behaving' nodes. One example of the usage of bisimulation partitioning is the grouping of equivalent pieces of information in XML documents and in documents on the web. Thereby bisimulation partitioning provided a way to calculate a structure index of these documents.
For my master thesis I studied how this problem can be solved for very large XML documents; and more generally for very large hierarchical data sources. Thereby we have designed algorithms that have a small internal memory footprint and perform as little IO operations as possible. The total expected IO cost of our approach is O(Sort(|N|) + Sort(|E|)); thus the IO cost for bisimulation partitioning is in the same order of magnitude as IO efficient sorting of the nodes and edges in the source graph. Further study of the subject of bisimulation partitioning has resulted in a more efficient approach with a worst case IO cost of O(Sort(|N| + |E|)).
The master thesis also briefly looks at partition maintenance. Thereby we have provided lower bounds on the complexity of bisimulation partition maintenance (in terms of observable changes to the partitions). We also give some sketches (using the results of previous chapters) on how bisimulation partitions can be updated. Thereby we see that edge changes are especially challenging to support efficiently.
This project is related to the following publications and presentations
On the bisimulation partitioning algorithm presented in my master thesis we have performed several practical experiments. We provide free access (under a BSD-like license) to the implemented algorithms.
The source code can be found here.
I have used Microsoft Visual Studio 2010 (Professional Edition) to develop the software; but there should be no reason why the (free) Express edition would not work. Furthermore the amount of platform dependent code is kept to a minimum; it thus should not be too much work to get the source working under any modern compiler on any modern platform.
The following libraries have been used:
See the 3rdparty directory in the source package for additional details on how these libraries are setup to work properly.