BisPy: Bisimulation in Python

The notion of bisimulation and in particular of maximum bisimulation — namely the bisimulation which contains all the other bisimulations on the graph — has applications in modal logic, formal verification, and concurrency theory (Kanellakis & Smolka, 1990), and is used for graph reduction as well (Gentilini et al., 2003). The fact that graphs may be used to create digital models of a wide span of complex systems makes bisimulation a useful tool in many different cases. For this reason several algorithms for the computation of maximum bisimulation have been studied throughout the years, and it is now known that the problem has an O(|E| log |V |) algorithmic solution (Paige & Tarjan, 1987), where V is the set of nodes in the graph, and E is the set of edges of the graph. BisPy is a Python package for the computation of maximum bisimulation.

The notion of bisimulation and in particular of maximum bisimulation -namely the bisimulation which contains all the other bisimulations on the graph -has applications in modal logic, formal verification, and concurrency theory (Kanellakis & Smolka, 1990), and is used for graph reduction as well (Gentilini et al., 2003). The fact that graphs may be used to create digital models of a wide span of complex systems makes bisimulation a useful tool in many different cases. For this reason several algorithms for the computation of maximum bisimulation have been studied throughout the years, and it is now known that the problem has an O(|E| log |V |) algorithmic solution (Paige & Tarjan, 1987), where V is the set of nodes in the graph, and E is the set of edges of the graph.

Statement of need
To the best of our knowledge, BisPy is the first Python project to address the problem presented above, and to meet the objectives of healthy open source software, namely extensive testing, documentation, and intuitive code commenting.
We think that our project may be a useful tool to study practical cases for students approaching the field -since the notion of bisimulation may be somewhat counterintuitive at first glance -as well as for established researchers, who may use BisPy to study improvements on particular types of graphs and to compare new algorithms with the state of the art.
It is interesting to observe that the package BisPy, briefly presented below, contains the implementation of more than one algorithm for the computation of maximum bisimulation, and every algorithm uses a peculiar strategy to obtain the result. For this reason, we think that our package may be useful to assess the performance of different approaches on a particular problem.

BisPy
Our package contains the implementation of the following algorithms: • Paige-Tarjan (1987), which employs an insight from the famous algorithm for the minimization of finite states automata (Hopcroft, 1971); • Dovier-Piazza-Policriti (2001), which uses the notion of rank to optimize the overhead of splitting the initial partition, and can be computed -prior the execution of the algorithm -using an O(|V | + |E|) procedure (Sharir, 1981;Tarjan, 1972); • Saha (2007), which can be used to update the maximum bisimulation of a graph after the addition of a new edge, and is more efficient than the computation from scratch in some cases (the computational complexity depends on how much the maximum bisimulation changes due to the modification).
Our implementations have been tested and documented deeply; moreover we split the algorithms into smaller functions, which we prefer to having a monolithic block of code in order to improve readability and testability. This kind of modularity allows us to reuse functions across multiple algorithms, since several procedures are shared (e.g., split is used in all three of the algorithms that we mentioned above, while the computation of rank is carried out only in the last two), and for the same reason we think that the addition of new functionalities would be straightforward since we have already implemented a significant set of common functions.

Example
We present the code that we used to generate the example shown in Figure 1. First of all we import the modules needed to generate the graph (BisPy takes NetworkX directed graphs in input) and to compute the maximum bisimulation.

Performance
We briefly examine some performance results on two different kinds of graphs: • Balanced trees (Cormen et al., 2009) with variable branching factor r and height h, for which we are going to use the notation B T (r, h); • Erdős-Rényi graphs (2009), also called binomial graphs, whose set E of edges is generated randomly (the cardinality |E| is roughly p|V |).
The first experiment involves balanced trees, and consists of the computation of the maximum bisimulation of trees with variable dimensions. The labeling set is the trivial partition of the set V . The results are shown in the left side of Figure 2. The quantity that varies along the x-axis is |E| log |V |, since this allows the presentation of data in a more natural way.
The performance complies with the expected complexity |E| log |V |: for instance our implementation of Dovier-Piazza-Policriti takes about 1.425 seconds to compute the maximum bisimulation on B T (3, 10), and 12.596 seconds on B T (3, 12). The value of the ratio |E B T (3,12) | log |V B T (3,12) | |E B T (3,10) | log |V B T (3,10) | is approximately 10.7, therefore the growth of the time function respects approximately the predicted behavior.
Concerning binomial graphs, we fixed p = 0.0005 in order to obtain a graph of some practical interest (as p → 1 the graph becomes complete, as p → 0 also |E| → 0). This time we also consider Saha's incremental algorithm, and we conduct the experiment as follows: 1. Generate a binomial graph with the aforementioned features; 2. Compute the maximum bisimulation using Paige-Tarjan's algorithm; 3. Add a random edge to the graph; 4. Compute the updated maximum bisimulation three times, using the three algorithms taken into account, and verify the time taken by each one.
Since the experiment is not deterministic (the graph and the new edge are generated randomly) we evaluate and visualize the mean time taken by step 4 on a sample of 1000 iterations of steps 1-4.
The knowledge of the old maximum bisimulation is of no interest for non-incremental algorithms. However Saha's algorithm uses this input to reduce the number of steps: the goal of the second experiment is in fact to illustrate this improvement. The results are shown in the right side of Figure 2  We ran the experiments on a workstation with operating system CentOS Linux, (x8664), processor Intel(R) Core(TM) i7-4790 CPU (4 cores, 3.60GHz), and 16 GB RAM. Graphs have been generated using functions from the Python package _NetworkX (Hagberg et al., 2008). We measured time using the Python module timeit (Van Rossum & Drake, 2009).