GTFS Segments: A Fast and Efficient Library to Generate Bus Stop Spacings

.


Statement of need
The choice of bus stop spacing involves a tradeoff between accessibility and speed: wider spacings mean passengers must travel farther to/from stops, but they allow the bus to move faster (Wu et al., 2022).Many US transit agencies have recently carried out stop consolidation campaigns that systematically remove stops, due partly to the perception US stop spacings are much narrower than those abroad.However, there are no reliable data sources to obtain current stop spacings despite the wide adoption of General Transit Feed Specification (GTFS) (Voulgaris & Begwani, 2023), because GTFS does not include data on stop spacings directly.Spacings must be computed from route shape geometries, stop locations, and stop sequences.A challenge is that stop locations are not placed on top of route shapes and therefore must be somehow projected onto the route's LINESTRING.To make spacings available for analysis, gtfs-segments use k-dimensional spatial trees and k-nearest neighbor heuristics to snap stops to routes and divide routes into segments for computation of spacings, as described below.
gtfs-segments was designed for researchers, transit planners, students and anyone interested in bus networks.The package has been used in several scholarly articles (Devunuri et al., 2023(Devunuri et al., , 2024;;Lehe & Pandey, 2022) and to create databases of spacings for over 550 agencies in the US (Devunuri et al., 2022) and 80 agencies in Canada (Devunuri, 2023).Several transit agencies, such as Regional Transportation District Denver (RTD-Denver), have used the package to visualize the effects of their bus stop consolidation efforts.Filtering functions allow the user to explore datasets, identify errors and compute specialized statistics.

Downloading GTFS feeds
The package permits the user to search and download recent GTFS feeds from the Mobility Database Catalogs (MobilityData, 2023).It allows for keyword and fuzzy search of GTFS feeds using location (e.g., Minneapolis, San Francisco) or provider name (e.g., WMATA, Capital Metro) as input.

Computing segments
The fundamental unit of analysis used by gtfs-segments is the segment, which is a piece of a bus network defined by three properties: (i) a start stop, (ii) an end stop and (iii) the path that the bus travels along the route in between the two consecutive stops.A segment's spacing is the distance of (iii).gtfs-segments produces segments by efficiently and robustly snapping stop locations onto route shapes.Figure 1 shows examples where a stop is equidistant from multiple route coordinates.Here, projecting the stop onto the route or snapping to the nearest geo-coordinate (lat, lon) may yield stops that are out-of-order or snapped far from their locations.Also, the time complexity of projection or snapping using brute force is () for n stops and m geo-coordinates that represent the route shape.gtfs-segments overcomes these challenges by increasing the route resolution (i.e., adding points in between geo-coordinates), using spatial k-d trees, and using more than one nearest neighbor.The increase in resolution allows stops to be snapped to nearby points.Using k-d trees reduces the time complexity to (()) and makes it possible to compare among several snapping points without added computation.Figure 2 shows an example where initially snapping to the nearest point produces out-of-order stops (3/4/2) and stop 5 is snapped far away from its location.Increasing the resolution (second panel) fixes 5's location problem but the ordering problem persists.By using k=3 nearest neighbors, we find a proper ordering (last panel).Once every stop has been snapped to a geo-coordinate on the route shape, the shape is segmented between stops and each segment's geometry is stored in a GeoDataFrame.(Toso & Oja, 2023) also compute segments.In addition to its snapping algorithm, visualization, download, and statistical functionalities, gtfs-segments is distinguished from those in two ways.First, it has a faster processing rate1 to compute segments both with and without parallel processing (see Table 1).Second, gtfs-segments is tolerant to deviations from GTFS standards.For example, because the Chicago Transit Authority does not have an agency_id in its routes.txt,gtfs2gps fails to read it even though this field is not needed for obtaining segments.

Visualizing stop spacings
The package can create maps of stops and segments (with basemap), including interactive maps.See Figure 3a, which colors segments by spacing.The package can also produce histograms of stop spacings (see Figure 3b), which can inform strategic decisions about network design.

Calculating stop spacing summary statistics
Discussions about stop spacings, commonly include statistical metrics such as means and medians, used to spacings between different agencies or track changes within an agency over time.gtfs-segments can produce weighted mean, median, and standard deviations for an agency, using different weighting systems (e.g., weighting segments by the number of times a bus traverses it or the number of routes that include it) as outlined by Devunuri et al. (2024).
For each route, gtfs-segments can give metrics such as mean spacing, headways, speeds, number of buses in operation and route lengths.

Figure 1 :
Figure 1: Example route shapes with stop locations that are equidistant from multiple points along the route.

Figure 2 :
Figure 2: Improvement in snapping due to an increase in resolution and suing k-nearest neighbors.. Adapted from "Bus Stop Spacings Statistics: Theory and Evidence"(Devunuri et al., 2024)

Figure 3 :
Figure 3: Other visualization features in the package.SFMTA GTFS feed was used to generate these.

Table 1 :
Comparison of average processing rates for gtfs2gps, gtfs functions and gtfs segments.