UvA-DARE (Digital Academic Repository) patRoon 2.0: Improved non-target analysis workflows including automated transformation product screening

data


Statement of need
The identification of chemicals in NTA still remains a grand challenge (Vermeulen et al., 2020); only a small percentage of detected masses can be confidently annotated with spectral libraries (Silva et al., 2015).The unidentified "dark matter" is partly due to TPs, motivating the need for TP screening workflows.Reported approaches to elucidate TPs in the environment include screening of known/predicted TPs, parent/TP classification techniques, isotope labeling experiments and identifying expected (dis)similarities in MS data (shown in Table 1; also reviewed in Li et al. (2021)).However, these approaches are typically designed for a single study or available only as a stand-alone and/or commercial tool.Furthermore, few TP prediction tools support specific pathways observed in the environment (e.g., microbial degradation), are open-source and can be readily integrated in workflows for automated batch predictions.patRoon 2.0 implements complementary TP screening approaches with select algorithms from Table 1 and includes other novel functionality to provide comprehensive TP screening workflows.The modular design of patRoon enables integration of more approaches and algorithms in the future.(Treutler et al., 2016), MetCirc (Naake & Gaquerel, 2017) Schollée et al. (2021) Since wide chemical coverage is desired with NTA and since TPs can ionize differently to their parent, HRMS analyses are often performed using positive and negative ionization mode.patRoon 2.0 is now capable of simultaneously processing, integrating and interpreting mixed mode data -a functionality not available in most workflows due to complexity and long processing times.
Further improvements to patRoon include interactive data curation and new prioritization and identification strategies, described further below.

New functionality Transformation product screening workflow
The patRoon 2.0 TP screening workflow starts with features (data points with unique chromatographic/MS information) obtained from a 'classical' patRoon workflow (Figure 1A).Then, data from one or more of TP screening (B,C), MS similarity (D) and parent/TP feature classification (E) is combined to link parent and TP features into components (F).The resulting data is then prioritized (G), corresponding features are annotated (H), and finally all data is reported (I).All algorithm parameters are configurable, yet simplified via defaults.This enables flexible and customizable workflows for a wide variety of applications.
TP screening uses known/predicted TP structures from parents (Figure 1B) or mass differences of transformations using metabolic logic (Schollée et al., 2015) (C).Parents for (B) are specified from (1) a target list, (2) results of a suspect screening to find parents by mass, or (3) candidates of feature compound annotation (see Helmus, ter Laak, et al. (2021)).Corresponding TP structures are then either obtained in silico with BioTransformer (Djoumbou-Feunang et al., 2019), or through a library search from PubChem data (Kim, 2021;Krier et al., 2022;Schymanski, Kondić, et al., 2021;Schymanski, Bolton, et al., 2021) or a custommade library.Metabolic logic (C), which does not depend on parent structural data, uses transformation reactions from Schollée et al. (2015) or a custom-made list.TP suspect screening then matches candidate TPs with detected features by mass.MS similarity (Figure 1D) is calculated, without a predefined parent list, from spectral match and/or equivalence of spectral annotations.Spectral match compares MS fragment spectra (MS/MS) with a cosine or Jaccard index similarity score (Stein & Scott, 1994).This was largely implemented in C++ to allow efficient comparison of large numbers of spectra (typically thousands).The calculation can be adjusted by (1) pre-treatment of spectra, e.g., with peak count and intensity thresholds, (2) weight assignment to intensity and m/z data, and (3) shifting TP spectra to highlight equal neutral losses (Schollée et al., 2017;Watrous et al., 2012).Furthermore, combining matched mass peaks from shifted and non-shifted spectra was implemented for similarity calculation of equivalent fragments and neutral losses.MS similarity from annotation equivalence compares formulas of annotated MS/MS fragments and neutral losses, based on additional data such as isotopic fit and spectral libraries.This potentially increases accuracy, but requires presence of annotations for parent/TP features.
Parent/TP feature classification (Figure 1E) is typically performed by statistical analyses with R, facilitated by the patRoon data export functionality.Fold-change calculation and visualization with volcano plots (Cui & Churchill, 2003) was implemented in patRoon 2.0 to simplify the usage of this common classification technique.During TP componentization (Figure 1F), each parent feature is linked with corresponding TP features and grouped in a TP component.Data prioritization (G) can then be performed with the subsetting functionality of patRoon and several newly implemented filters (Table 2).Existing MetFrag (Ruttkies et al., 2016) annotation functionality was extended to include predicted TP structures (B) to allow in silico MS/MS annotation (H) of TPs absent from commonly used databases.The interactive reporting (I) functionality was extended to simplify inspection of TP screening results (see Figure 2).

Sets workflows: combining positive and negative MS ionization data
In a sets workflow, positive and negative data is automatically processed and combined (Figure 3A).Features are obtained for each polarity, and optionally prioritized with polarity specific conditions (e.g., minimum intensity).Then, the feature m/z values are replaced with neutral masses calculated from adduct information (defined manually or via feature adduct annotations), and subsequently aligned and grouped across polarities (with configurable tolerances).Subsequent steps largely follow the patRoon 1.0 workflow (Helmus, ter Laak, et al., 2021).Algorithms incapable of processing polarity mixed data are automatically executed with polarity specific data, and outputs are subsequently combined.Moreover, a consensus for formula/compound annotations can be reached, for instance, to eliminate candidates not found for both polarities.
Sets workflows follow a generic design, where each set is a group of analyses that demand independent processing of MS related data (features, mass spectra etc).Therefore, sets can also be differentiated by other MS parameters such as MS/MS fragmentation technique or ionization source.Furthermore, the design allows future implementation of workflows with different chromatographic methods, for instance, to simultaneously process data from different instruments.
Figure 3: A Sets workflow with simultaneous processing of positive and negative data.Alignment of positive/negative features can be improved with adduct annotations.The workflow continues identically to the patRoon 1.0 workflow, and positive/negative data is automatically processed separately for algorithms without mixed polarity support.B Default YAML configuration file used for estimation of suspect identification levels from annotation scores, candidate rankings and other data.

Other new functionality
Other new functionality of patRoon 2.0 includes: • Improved suspect screening -Automatic estimation of identification levels (Schymanski et al., 2014) using a configurable and extensible rule based approach (see Figure 3B).-Combining suspect and non-target screening workflows.
-Merging results from different screenings.
• Improved adduct annotation -Automatic prioritization of features with preferred adducts.
-Use of adduct annotations with formula/compound annotation.
-Calculation and prioritization with peak scores derived from aforementioned peak qualities.• Interactive graphical tools to inspect and curate workflow data and to train and inspect feature classifications with MetaClean.• Refactoring and updates of newProject() to generate code for the new functionality.
• A delete function to remove unwanted workflow data, e.g., to implement custom filters.
• More approaches to parallelize R code and support high performance computing using the future package (Bengtsson, 2021).• Bug fixes and improvements from user feedback.
A complete listing of all changes is outlined in the project news file.

Simultaneous processing of positive/negative data
Performing a sets workflow is straightforward, and requires only few additions to a patRoon 1.0 workflow.

TP screening
The code below demonstrates a simple TP screening workflow where (1) parents are screened, (2) corresponding TPs are predicted with BioTransformer, (3) the TPs are screened, (4) TP components are generated and (5) all results are reported.

Figure 1 :
Figure 1: TP screening workflow in patRoon 2.0.One or more of steps B/C, D and E are used to generate TP components by linking and grouping parent/TP features (F).The TP annotation (H) can be enriched with data from (B).

Figure 2 :
Figure 2: Example report with TP screening results (bottom) for a selected parent (top).

Table 1 :
Overview of TP screening approaches relevant to environmental screening.Bold: implemented/interfaced in patRoon.

Table 2 :
Filters to prioritize TP components.