microViz: an R package for microbiome data visualization and statistics

microViz is an R package for the statistical analysis and visualization of microbiota data. This package extends the functionality of popular microbial ecosystem data analysis R packages, including phyloseq (McMurdie & Holmes, 2013), vegan (Oksanen et al., 2020) and microbiome (Lahti & Shetty, 2012-2019). microViz provides a selection of powerful additions to the toolbox of researchers already familiar with phyloseq and microbiome, as well as assisting researchers with less R programming experience to independently explore and analyse their data and to generate publication-ready figures.


Summary
microViz is an R package for the statistical analysis and visualization of microbiota data. This package extends the functionality of popular microbial ecosystem data analysis R packages, including phyloseq (McMurdie & Holmes, 2013), vegan (Oksanen et al., 2020) and microbiome (Lahti & Shetty, 2012-2019. microViz provides a selection of powerful additions to the toolbox of researchers already familiar with phyloseq and microbiome, as well as assisting researchers with less R programming experience to independently explore and analyse their data and to generate publication-ready figures.
The tools offered by microViz include: • A Shiny app (Chang et al., 2021) for interactive exploration of microbiota data within R, pairing ordination plots with abundance bar charts • Easy to use functions for generating publication-ready ordination plots with ggplot2 (Wickham, 2016), accommodating constrained and partial ordination, bi-plots and triplots, and automatic captioning designed to promote methodological transparency and reproducibility • A novel visualization approach pairing ordination plots with circular bar charts (iris plots) for comprehensive, intuitive and compact visualization of the similarity and composition of hundreds of microbial ecosystems • Correlation and composition heatmaps for microbiome data annotated with plots showing each taxon's prevalence and/or abundance • A compact cladogram visualization approach for intuitive comparison of numerous microbe-metadata associations derived from (multivariable) statistical models (taxonomic association trees)

Statement of need
Modern microbiome research typically involves the use of next-generation-sequencing methods to profile the relative abundance of hundreds of microbial taxa across tens, hundreds or thousands of samples. Alongside increasing sample sizes, the amount of relevant metadata collected is growing, particularly in human cohort studies. These trends all increase the size and complexity of the resulting dataset, which makes its exploration, statistical analysis, and presentation increasingly challenging.
Ordination methods like Principle Coordinates/Components Analyses (PCoA/PCA) are a staple method in microbiome research. The vegan R package implements the majority of distance and ordination calculation methods and microViz makes available two further dissimilarity measures: Generalized UniFrac from the GUniFrac package, and the Aitchison distance. The former provides a balanced intermediate between unweighted and weighted UniFrac (Chen et al., 2012), and the latter is a distance measure designed for use with compositional data, such as sequencing read counts (Gloor et al., 2017).
The phyloseq R package provides an interface for producing ordination plots, with the ggplot2 R package. microViz streamlines the computation and presentation of ordination methods including the constrained analyses: redundancy analysis (RDA), distance-based RDA, partial RDA, and canonical correspondence analysis (CCA). microViz can generate highly customizable ggplot2 bi-plots and tri-plots, showing labelled arrows for microbial loadings and constraint variables when applicable. Furthermore, these figures are captioned automatically, by default. The captions are intended to promote better reporting of ordination methods in published research, where too often insufficient information is given to reproduce the ordination plot. To provide the automated captioning, microViz implements a simple S3 list class, ps_extra, for provenance tracking, by storing distance matrices and ordination objects alongside the phyloseq object they were created from, as well as relevant taxonomic aggregation and transformation information.
Moreover, microViz provides a Shiny app interface (Chang et al., 2021) that allows the user to interactively create and explore ordination plots directly from phyloseq objects. The Shiny app generates code that can be copy-pasted into a script to reproduce the interactively designed ordination plot. The user can click and drag on the interactive ordination plot to select samples and directly examine their taxonomic compositions on a customizable stacked bar chart with a clear colour scheme.
Alternatively, for a comprehensive and intuitive static presentation of both sample variation patterns and underlying microbial composition, microViz provides an easy approach to pair ordination plots with attractive circular bar charts (iris plots) by ordering the bar chart in accordance with the rotational position of samples around the origin point on the ordination plot, e.g. Figure 1. Bar charts do have limitations when visualizing highly diverse samples, such as the adult gut microbiome, at a detailed taxonomic level. This is why microViz also offers an enhanced heatmap visualization approach, pairing an ordered heatmap of (transformed and/or scaled) microbial abundances with compact plots showing each taxon's overall prevalence and/or abundance distribution. The same annotation can easily be added to metadata-tomicrobe correlation heatmaps.
microViz provides a flexible wrapper around methods for the statistical modelling of microbial abundances, including e.g. beta-binomial regression models from the corncob R package (Martin et al., 2020), and compositional linear regression. To visualize metadata-to-microbiome associations derived from more complicated statistical models, microViz offers a visualization approach that combines multiple annotated cladograms to comprehensively and compactly display patterns of microbial associations with multiple covariates from the same multivariable statistical model. These "taxonomic association trees" facilitate direct comparison of the direction, strength and significance of microbial associations between covariates and across multiple taxonomic ranks. This visualisation also provides an intuitive reminder of the balancing act inherent in compositional data analysis: if one clade/branch goes up, others must go down. Other packages in R, such as ggtree (Yu et al., 2017) or metacoder (Foster et al., 2017), can be used to make annotated cladograms similar to the microViz taxonomic association tree visualizations, but the microViz style has a few advantages for the purpose of reporting multivariable model results: Firstly, microViz cladogram generating functions are directly paired with functions to compute the statistical model results for all taxa in a phyloseq. Secondly, the tree layouts are more compact, by default, for displaying multiple trees for easy comparison.
Finally, beyond the main visualization functionality, microViz provides a suite of tools for working easily with phyloseq objects including wrapper functions that bring approaches from the popular dplyr package to phyloseq, to help the researcher easily filter, select, join, mutate and arrange phyloseq sample data. All microViz functions are designed to work with magrittr's pipe operator (%>%), to chain successive functions together and improve code readability (Bache & Wickham, 2020). Lastly, for user convenience, microViz documentation and tutorials are hosted online via a pkgdown (Wickham & Hesselberth, 2020) website on Github Pages, with extensive examples of code and output generated with example datasets.