rabpro: global watershed boundaries, river elevation profiles, and catchment statistics

River and Basin Profiler (rabpro) is a Python package to delineate watersheds, extract river flowlines and elevation profiles, and compute watershed statistics for any location on the Earth’s surface. As fundamental hydrologically-relevant units of surface area, watersheds are areas of land that drain via aboveground pathways to the same location, or outlet. Delineations of watershed boundaries are typically performed on digital elevation models (DEMs) that represent surface elevations as gridded rasters. Depending on the resolution of the DEM and the size of the watershed, delineation may be very computationally expensive. With this in mind, we designed rabpro to provide user-friendly workflows to manage the complexity and computational expense of watershed calculations given an arbitrary coordinate pair. In addition to basic watershed delineation, rabpro will extract the elevation profile for a watershed’s mainchannel flowline. This enables the computation of river slope, which is a critical parameter in many hydrologic and geomorphologic models. Finally, rabpro provides a user-friendly wrapper around Google Earth Engine’s (GEE) Python API to enable cloud-computing of zonal watershed statistics and/or time-varying forcing data from hundreds of available datasets. Altogether, rabpro provides the ability to automate or semi-automate complex watershed analysis workflows across broad spatial extents.


Summary
River and Basin Profiler (rabpro) is a Python package to delineate watersheds, extract river flowlines and elevation profiles, and compute watershed statistics for any location on the Earth's surface. As fundamental hydrologically-relevant units of surface area, watersheds are areas of land that drain via aboveground pathways to the same location, or outlet. Delineations of watershed boundaries are typically performed on digital elevation models (DEMs) that represent surface elevations as gridded rasters. Depending on the resolution of the DEM and the size of the watershed, delineation may be very computationally expensive. With this in mind, we designed rabpro to provide user-friendly workflows to manage the complexity and computational expense of watershed calculations given an arbitrary coordinate pair. In addition to basic watershed delineation, rabpro will extract the elevation profile for a watershed's mainchannel flowline. This enables the computation of river slope, which is a critical parameter in many hydrologic and geomorphologic models. Finally, rabpro provides a user-friendly wrapper around Google Earth Engine's (GEE) Python API to enable cloud-computing of zonal watershed statistics and/or time-varying forcing data from hundreds of available datasets. Altogether, rabpro provides the ability to automate or semi-automate complex watershed analysis workflows across broad spatial extents.  (Didan, 2015), topo slope (Amatulli et al., 2020), precipitation (GPM, 2019), soil moisture (ONeill et al., 2018), and temperature (Copernicus, 2017). (F, G) Basin-averaged time-series data fetched by rabpro for the temperature and precipitation datasets in (E).

Statement of Need
Watersheds play a central and vital role in many scientific, engineering, and environmental management applications (See Brooks (2003) for a comprehensive overview). While rabpro can benefit any watershed-based research or analysis, it was designed to satisfy the needs of data-driven rainfall-runoff models. These models aim to predict streamflow (runoff) time series as a function of precipitation over upstream land area (i.e. the watershed). In addition to watershed delineations and precipitation estimates, they typically require data on both timevarying parameters (or forcing data) like temperature, humidity, soil moisture, and vegetation as well as static watershed properties like topography, soil type, or land use/land cover Kratzert et al., 2019Kratzert et al., , 2021Nearing et al., 2021). The rabpro API enables users to manage the complete data pipeline necessary to drive such a model starting from the initial watershed delineation through the calculation of static and time-varying parameters. Some hydrologic and hydraulic models also require channel slope for routing streamflow (Boyle et al., 2001;Piccolroaz et al., 2016;Wilson et al., 2008), developing rating curves (Colby, 1956;Fenton & Keller, 2001), or modeling local hydraulics (Schwenk et al., 2017(Schwenk et al., , 2015Schwenk & Foufoula-Georgiou, 2016).
The need for watershed-based data analysis tools is exemplified by the growing collection of published datasets that provide watershed boundaries, forcing data, and/or watershed attributes in precomputed form, including CAMELS (Addor et al., 2017), CAMELS-CL, -AUS, and -BR (Alvarez-Garreton et al., 2018;Chagas et al., 2020;Fowler et al., 2021), Hysets (Arsenault et al., 2020), and HydroAtlas (Linke et al., 2019). These datasets provide off-the-shelf options for building streamflow models, but they suffer from a degree of inflexibility. For example, someone desiring to add a watershed attribute, to use a new remotely-sensed data product, or to update the forcing data time-series to include the most recently available data must go through the arduous process of sampling it themselves. rabpro was designed to provide flexibility for both building a watershed dataset from scratch or appending to an existing one.
While we point to streamflow modeling as an example, many other applications exist. rabpro is currently being used to contextualize streamflow trends, build a data-driven model of riverbank erosion, and generate forcing data for a mosquito population dynamics model. rabpro's focus is primarily on watersheds, but some users may also find rabpro's Google Earth Engine wrapper convenient for sampling raster data over any geopolygon(s). For example, Earth System Models commonly require sampling raster datasets over watersheds or other polygons for parameterizations and validations (Chen et al., 2020;Fisher et al., 2019).

State of the field
The importance of watersheds, availability of DEMs, and growing computational power has led to the development of many excellent open-source terrain (DEM) analysis packages that provide watershed delineation tools, including TauDEM (Tarboton, 2005), pysheds (Bartos, 2020), Whitebox Tools (Lindsay, 2016), SAGA (Conrad et al., 2015), among many others. Computing statistics and forcing data from geospatial rasters also has a rich history of development, and Google Earth Engine (Gorelick et al., 2017) has played an important role. Almost a decade has passed since Google Earth Engine has been available to developers, and the community has in-turn developed open-source packages to interface with its Python API in user-friendlier ways, including gee_tools (Principe, 2021), geemap (Wu, 2020), eemont (Montero, 2021), and restee (Markert, 2021)-each of which provides support for sampling zonal statistics and time series from geospatial polygons.
However, to our knowledge, rabpro is the only available package that provides efficient endto-end delineation and characterization of watershed basins at scale. While a combination of the cited terrain analysis packages and GEE toolboxes can achieve rabpro's functionality, rabpro's blending of them enables simpler, less error-prone, and faster results.
One unique rabpro innovation is its automation of "hydrologically addressing" input coordinates. DEM watershed delineations require that the outlet pixel be precisely specified; in many rabpro use cases, this is simply a (latitude, longitude) coordinate that may not align with the underlying DEM. rabpro will attempt to "snap" the provided coordinate to a nearby flowline while minimizing the snapping distance and the difference in upstream drainage area (if provided by the user). Another unique rabpro feature is the ability to optimize the watershed delineation method according to basin size such that pixel-based (from MERIT-Hydro (Yamazaki et al., 2019)) delineations can be used for more accurate estimates and/or smaller basins, and coarser subbasin-based (from HydroBASINS (Lehner & Grill, 2014)) delineations can be used for rapid estimates of larger basins. Functionality rabpro executes watershed delineation based on either the MERIT-Hydro dataset, which provides a global,~90 meter per pixel, hydrologically-processed DEM suite, or the HydroBASINS data product, which provides pre-delineated subbasins at approximately~230 km^2 per subbasin. Conceptually, basin delineation is identical for both. The user-provided coordinate is . rabpro: global watershed boundaries, river elevation profiles, and catchment statistics. Journal of Open Source Software, 7 (73), 4237. https://doi.org/10.21105/joss.04237.
hydrologically addressed by finding the downstream-most pixel (MERIT-Hydro) or subbasin (HydroBASINS). The watershed is then delineated by finding all upstream pixels or subbasins that drain into the downstream pixel/subbasin and taking the union of these pixels/subbasins to form a single polygon. A user must therefore download either the MERIT-Hydro tiles covering their study watershed or the appropriate HydroBASINS product; rabpro provides tooling to automate these downloads and create its expected data structure (See the Downloading data notebook). rabpro does not currently provide support for custom watershed datasets similar to HydroBASINS due to attribute field and data structure requirements that must be consistent for generalizability.
There are three primary operations supported by rabpro: 1) basin delineation, 2) elevation profiling, and 3) subbasin (zonal) statistics. If operating on a single coordinate pair, the cleanest workflow is to instantiate an object of the profiler class and call (in order) the delineate_basins(), elev_profile(), and basin_stats() methods (See the Basic Example notebook). If operating on multiple coordinate pairs, the workflow is to loop through each coordinate pair while delineating each watershed (optionally calculating its elevation profile). As the loop runs, the user collects each basin polygon in a list, concatenates the list, and directly calls basin_stats.compute() on the resulting GeoDataFrame (See the Multiple Basins Example notebook). More details on package functionality can be found in the documentation.  (Prior et al., 2022) watersheds in Sri Lanka are delineated and zonal statistics are run for water occurrence, temperature, and precipitation. Dependencies rabpro relies on functionality from the following Python packages: GDAL (GDAL/OGR contributors, 2020), NumPy (Harris et al., 2020), GeoPandas (Jordahl et al., 2020), Shapely (Gillies & others, 2007), pyproj (Snow et al., 2021), scikit-image (Van der Walt et al., 2014), scipy , and earthengine-api (Gorelick et al., 2017). Use of the watershed statistics methods requires a free Google Earth Engine account. Required MERIT-Hydro and HydroBASINS data are freely available for download by visiting their websites or using rabpro's download scripts; MERIT-Hydro requires users to first register to receive a username and password for access to downloads. . rabpro: global watershed boundaries, river elevation profiles, and catchment statistics. Journal of Open Source Software, 7 (73), 4237. https://doi.org/10.21105/joss.04237.