cosasi: Graph Diffusion Source Inference in Python

cosasi (COntagion

• perform and evaluate source localization using standard techniques from literature, • contribute innovative algorithms to a growing core library, and • benchmark new techniques against a battery of comparable schemes.
The software is currently used within the Logistics Management Institute. Additional development continues, and we welcome contribution from the broader academic and industrial communities.

Statement of Need
Because spreading phenomena -including viral epidemics, rumors, and malware -often proceed as a function of pairwise interactions, it is practical to model their propagation as diffusion processes on networks. The source inference/localization problem is that of estimating the inverse of this cascade, aimed at identifying the "patient(s) zero" from partial observations. This problem has captured the attention of epidemiologists, security researchers, social scientists, and more, dating back to Shah and Zaman's seminal work on rumor centrality (Shah & Zaman, 2011).
Since then, source inference algorithms have been developed across subject areas, with practitioners often contributing new techniques in domain-specific venues. Additionally, algorithms tend to be problem-specific, with various solutions preferable for different diffusion processes and network topologies. Finally, researchers interested in novel source localization algorithms may not have time to implement a robust battery of alternatives to compare new schemes against the state-of-the-art.
cosasi provides a standard framework for researchers and practitioners alike to perform graph diffusion source inference. The package implements a number of prominent techniques from literature and provides utilities for estimating the number of sources, partitioning infection subgraphs, and more. Where possible, source identification methods are extended as ranking algorithms for hypothesis comparison. cosasi also offers a benchmark suite, which automatically implements a battery of comparable localization methods applicable to the graph diffusion use case at hand, enabling users to easily evaluate novel techniques against appropriate baselines. Standardization is emphasized; for instance, all source inference methods return a SourceResult object, which provides resources for analyzing, ranking, comparing, and learning more about hypothesized sources and the techniques used.

Background
Given an undirected graph = ( , ) with vertex set and edge set , a diffusion process begins with a source set ⊆ and spreads along the edges according to some (usually stochastic) propagation function. It is common for diffusion processes to invoke formalizations from epidemiology, such as the Susceptible-Infected (SI) model, which can represent information spread, or the Susceptible-Infected-Recovered (SIR) model, which can represent dynamics more evocative of viral epidemics. Even when describing metaphorical contagion, such as rumors, it is standard to refer to vertices affected by the spreading process as "infected." The infection subgraph is the subgraph of induced by the infected vertices at time . In the single-source SI model, is guaranteed to be connected. A common setting for source localization is to infer from some . More recently, some techniques have incorporated information from a small set of observers, who record the time at which they become infected (Zhu et al., 2016).
Broadly speaking, source estimators fall into one of two categories: message-passing algorithms, such as Short-Fat Tree (Zhu & Ying, 2014), or spectral algorithms, such as NETSLEUTH (Prakash et al., 2012). An extensive overview of source localization techniques is provided by Ying & Zhu (2018).

Availability and Documentation
cosasi is available under the MIT License. The package may be cloned from the GitHub repository or via PyPI: pip install cosasi.
Documentation is provided via Read the Docs, including a tutorial introducing major functionality and a detailed API reference. Extensive unit testing is employed throughout the library, with~97% code coverage.

Similar Software
To the author's knowledge, the only comparable and active source localization software is RPaSDT (Frąszczak, 2022). Here, we enumerate a handful of differences between RPaSDT and cosasi, which we believe make cosasi preferable for user accessibility, scalability, and community contribution: • Presentation: RPaSDT is a GUI toolkit. cosasi is an importable package, with extensive documentation and unit testing.
• Benchmarking: RPaSDT does not provide automatic benchmarking, whereas this is a core feature of cosasi.
• Multi-Source Capabilities: Multi-source inference in RPaSDT is generally performed by partitioning the infection subgraph and applying single-source algorithms to each partition. cosasi implements this strategy, as well, but also supports "natural" multisource inference that does not require repurposing single-source techniques.
• Estimator Utilities: When extending single-source algorithms to the multi-source regime (as described above), it is generally necessary to specify the number of clusters into which we partition the infection subgraph -that is, the hypothesized number of infection sources. cosasi provides a handful of relevant techniques for estimating this quantity, including the Eigengap heuristic (Von Luxburg, 2007) and Minimum Description Length (Prakash et al., 2012).
• Multiple Information Types: Some source inference algorithms require information other than an infection subgraph. For instance, Earliest Infection First relies on a collection of observers, who report the time at which they become infected (Zhu et al., 2016). cosasi provides multiple methods for providing state information to the source inference modules, enabling a wider array of potential localization algorithms.
Whisper was an earlier, thematically similar web application. The project has been inactive since 2016, the web interface is no longer online, and the underlying library is less feature-rich than cosasi or RPaSDT.
A recent graph autoencoder-based approach by Ling and colleagues performs maximum a posteriori source estimation using a generative prior over diffusion sources (Ling et al., 2022). The corresponding GitHub repository implements their SL-VAE method, but is not a general-purpose diffusion source localization framework.