The SkipJoin temporal join algorithm is a variant of the standard high-performance internal-memory forward-scan algorithm for joining sorted arrays of events (intervals with a start and end time). SkipJoin improves on forward-scan algorithms by providing efficient support for windowed joins and for skewed data. To support SkipJoin, we also provide the Stab Forest temporal index structure. This index structure supports highly-efficient stab-queries and multi-stab-queries (which retrieve events active at given points in time) and efficiently supports append-only operations.
The implementation of SkipJoin and of the Stab Forest is in C++14 and is tested using two modern C++ compiler, namely Microsoft C/C++ Compiler Version 19.13.26132 for x64 (part of Visual Studio 2017) and GNU g++ 8.1.0. The code should work with any sufficiently standard-compliant C++ compiler. Some of the supporting scripts are implemented using Java (any modern java compiler should suffice). The implementation is available for public use under the BSD-license (license details included in the source files):
In our experiments, we used the Airline On-Time Performance Data (AOTPD) dataset and the Civil Unrest Event Data (CUED) datasets. The raw datasets used have a total size of approximately 2.5GB. Hence, we do not provide copies of these datasets here. Instructions on how to build the datasets is provided with the source code. Furthermore, we have offline backups of the datasets that can be provided upon request.