spaghetti: spatial network analysis in PySAL

The role spatial networks, such as streets, play on the human experience cannot be overstated. All of our daily activities fall along, or in close proximity to, roads, bike paths, and subway systems to name a few. Therefore, when performing spatial analysis in many cases considering network space, as opposed to Euclidean space, allows for a more precise representation of daily human action and movement patterns. For example, people generally cannot get to work by driving in a straight line directly from their home, but move along paths within networks. To this end, spaghetti (spatial graphs: networks, topology, & inference), a sub-module embedded in the wider PySAL ecosystem, was developed to address network-centric research questions with a strong focus on spatial analysis (Gaboardi et al., 2018; Rey et al., 2015; Rey & Anselin, 2007).


Summary
The role spatial networks, such as streets, play on the human experience cannot be overstated. All of our daily activities fall along, or in close proximity to, roads, bike paths, and subway systems to name a few. Therefore, when performing spatial analysis in many cases considering network space, as opposed to Euclidean space, allows for a more precise representation of daily human action and movement patterns. For example, people generally cannot get to work by driving in a straight line directly from their home, but move along paths within networks. To this end, spaghetti (spatial graphs: networks, topology, & inference), a sub-module embedded in the wider PySAL ecosystem, was developed to address network-centric research questions with a strong focus on spatial analysis (Gaboardi et al., 2018;Rey et al., 2015;Rey & Anselin, 2007).
Through spaghetti, first, network objects can be created and analysed from collections of line data by various means including reading in a shapefile or passing in a geopandas.Ge oDataFrame at which time the line data are assigned network topology. Second, spaghe tti provides computational tools to support statistical analysis of so-called network-based events along many different types of previously loaded networks. Network based-events or near-network observations are events that happen along spatial networks in our daily lives, i.e., locations of trees along footpaths, biking accidents along roads or locations of coffee shops along streets. As with spaghetti.Network objects, spaghetti.PointPattern objects can be created from shapefiles, geopandas.GeoDataFrame objects or single libpy sal.cg.Pointobjects. Near-network observations can then be snapped to nearest network segments enabling the calculation of observation distance matrices. Third, these observation distance matrices can be used both within spaghetti to perform clustering analysis or serve as input for other network-centric problems (e.g., optimal routing), or within the wider PySAL ecosystem to perform exploratory spatial analysis with esda. Finally, spaghetti's network elements (vertices and arcs) can also be extracted as geopandas.GeoDataFrame objects for visualization and integrated into further spatial statistical analysis within PySAL (e.g., esda).

Related Work & Statement of Need
The most well-known network analysis package within the Python scientific stack is Net-workX (Hagberg et al., 2008), which can be used for modelling any type of complex network (e.g., social, spatial, etc.). OSMnx (Boeing, 2017) is built on top of NetworkX and queries OpenStreetMap for modelling street networks with resultant network objects returned within a geopandas.GeoDataFrame (Jordahl et al., 2021). Another package, pandana (Foti et al., 2012), is built on top of pandas (McKinney, 2010;Reback et al., 2021) with a focus on shortest path calculation and accessibility measures. Within the realm of Python, the functionality provided by snkit (Russell & Koks, 2019) is most comparable to spaghetti, though it's main purpose is the processing of raw line data into clean network objects. Outside of Python, SANET (Okabe et al., 2006) is the most closely related project to spaghetti, however, it is not written in Python and provides a GUI plugin for GIS software such as QGIS. Moreover, SANET is not fully open source. While all the libraries above are important for network-based research, spaghetti was created and has evolved in line with the Python Spatial Analysis Library ecosystem for the specific purpose of utilizing the functionality of spatial weights in libpysal for generating network segment contiguity objects.

Planned Enhancements
As with any software project, there are always plans for further improvements and additional functionality. Four such major enhancements are described here. The first addition will likely be network partitioning through use of voronoi diagrams generated in network space. Network-constrained voronoi diagrams can be utilized as tools for analysis in and of themselves and can also be input for further analysis, such as the voronoi extension of the Network K function (Okabe & Sugihara, 2012). Second, the current algorithm for allocating observations to a network within spaghetti allows for points to be snapped to a single location along the nearest network segment. While this is ideal for concrete observations, such as individual crime incidents, multiple network connections for abstract network events, such as census tract centroids, may be more appropriate . Third, the core functionality of spaghetti is nearly entirely written with pure Python data structures, which are excellent for code readability and initial development but generally suffer in terms of performance. There are currently several functions that can be utilized with an optional geopandas installation, however, further integration with the pandas stack has the potential to greatly improve performance. Finally, spaghetti developers will assess together with PySAL developers how to best support visualization and visual analysis targeted towards spaghetti network objects, implemented within visualization packages like splot or mapclassify and exposed as high level plotting functionality in spaghetti (Lumnitz et al., 2020).

Concluding Remarks
Network-constrained spatial analysis is an important facet of scientific inquiry, especially within the social and geographic sciences (Marshall et al., 2018). Being able to perform this type of spatial analysis with a well-documented and tested open-source software package further facilitates fully reproducible and open science. With these motivations and core values, the spaghetti developers and wider PySAL team look forward to creating and supporting research into the future.