Generating fragment density plots in R/Bioconductor with VplotR

A multitude of proteins bind to DNA at regulatory regions to control the expression of neighbouring genes. Their binding can be revealed by chromatin accessibility assays such as DNaseseq, MNase-seq or ATAC-seq. The size of the sequencing fragments generated by these assays, as well as their location relative to the center of regulatory regions, can be used to produce two-dimensional fragment density plots called V-plots. Such plots have successfully been used in the past to reveal nucleosome positioning or transcription factor binding sites at regulatory regions in different genomes.


Summary
A multitude of proteins bind to DNA at regulatory regions to control the expression of neighbouring genes. Their binding can be revealed by chromatin accessibility assays such as DNaseseq, MNase-seq or ATAC-seq. The size of the sequencing fragments generated by these assays, as well as their location relative to the center of regulatory regions, can be used to produce two-dimensional fragment density plots called V-plots. Such plots have successfully been used in the past to reveal nucleosome positioning or transcription factor binding sites at regulatory regions in different genomes.
Here, we present VplotR, an R package to easily generate V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The use of VplotR will improve our understanding of molecular organization at regulatory regions.

Statement of Need
VplotR is an R package facilitating the generation of V-plots, i.e. two-dimensional paired-end sequencing fragment density plots (Henikoff et al., 2011). V-plots have been used in the past to elucidate nucleosome positioning and/or transcription factor binding at regulatory elements (Henikoff et al., 2011;Schep et al., 2015) (e.g. Figure 1). Only a few tools have been developed that can easily generate V-plots (Beati & Chereji, 2020;Schep et al., 2015), and they are provided as scripts to be used with the command-line interface. Thus, they lack the customization and interaction with other datasets typically available in the wealthy R/Bioconductor environment.
VplotR provides functions to import paired-end sequencing bam files from any type of DNA accessibility experiments (e.g. ATAC-seq, DNA-seq, MNase-seq) and can produce V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The R package is well integrated within the Bioconductor environment and easily fits in standard genomic analysis workflows. Integrating V-plots into existing analytical frameworks has already brought new insights in chromatin organization (Serizay et al., 2020). Figure 1: Interpretation of V-plots. A-Sequenced fragments (for instance obtained from ATACseq) mapping to a locus of interest can originate from either nucleosomal DNA (in pink) or from nucleosome-free DNA (for instance from nucleosome-depleted regions (NDRs), in blue). B-The fragments can be embedded in a two-dimension graph. The horizontal coordinate represents the distance from the center of a fragment to the center of a locus of interest (for instance the NDR). The vertical coordinate represents the length of the fragment. C-When this projection is done over hundreds of loci, it results in a fragment density plot, i.e. a matrix which can be visualized as a heatmap, with the color gradient representing the density of fragments at each set of coordinates.

Availability
VplotR is available as an R package and can be readily installed from Bioconductor. The development version can be found on GitHub. Package dependencies and system requirements are documented at https://js2264.github.io/VplotR/. VplotR has been tested using R (version 3.5 and later) on macOS (versions 10.11 and later), Ubuntu 18.04.2 and Windows machines.
To ensure stability, VplotR includes unit tests for most functions and supports continuous integration using the Travis CI platform. Code contributions, bug reports, fixes and feature requests are most welcome by opening issues and pull requests at https://js2264.github.io/ VplotR/.

Implementation
The main user-level functions of VplotR are plotVmat(), plotProfile() and plotFoot print(). plotVmat() is used to generate V-plots (i.e. paired-end fragment density plots aggregated over a set of loci of interest) while plotProfile() is used to generate paired-end fragment plots over a single genomic locus. plotFootprint() can be used to profile the DNA accessibility footprint measured by a genomic assay over a motif of interest. Additional functions such as importPEBamFiles() and getFragmentsDistribution() are useful to import and investigate sets of paired-end sequencing fragments. Full examples of how to use the main package functions are described in the package vignette available at https: //js2264.github.io/VplotR/articles/VplotR.html.

Importing data into R environment
Importing paired-end sequencing fragments (usually from MNase-seq, DNase-seq or ATAC-seq experiments) can be done with VplotR using the importPEBamFiles() function. If imported fragments were obtained by ATAC-seq, the argument shift_ATAC_fragments should be set to TRUE to shift the ATAC-seq sequencing fragments by +4/-5 bp (Adey et al., 2010;Buenrostro et al., 2013).

Checking fragment size distribution
The distribution of fragment sizes can be obtained with getFragmentsDistribution() and plotted using ggplot2 (Figure 2):

Plotting DNA accessibility footprint
Observations from V-plots can be further investigated by plotting DNA accessibility footprints over these loci. The plotFootprint() function can be leveraged to plot these footprint profiles (Figure 4).

Plotting fragments over a single genomic locus
Finally, VplotR provides a function to plot the distribution of paired-end fragments over an individual genomic window ( Figure 5). The paired-end fragments overlapping a locus of interest (e.g., binding sites, provided in the loci argument) are shown in red while the remaining fragments mapping to the genomic window are displayed in black. Marginal curves are also plotted on the side of the distribution plot; they highlight the smoothed distribution of the position of paired-end fragment midpoints (top) or of the paired-end fragment length (right). Furthermore, genomic features provided in the annots arguments are displayed as horizontal bars on top of the plots. Here, the distribution of yeast MNase paired-end fragments over the bi-directional promoter regulating two divergent genes clearly reveals the ABF1 binding site (in red) located in the nucleosomedepleted region (NDR).

Research using VplotR
VplotR was recently leveraged to provide accurate insights into differential organization of nucleosomes flanking ubiquitous or tissue-specific promoters in C. elegans (Serizay et al., 2020).

Data availability
Yeast MNase-seq data has been obtained from Henikoff et al. (2011) (SRR3193263). ABF1 and REB1 binding motifs in yeast have been annotated in sacCer3 genome using TFBStools (Tan & Lenhard, 2016) and JASPAR2018 (Khan et al., 2018). The motif occurrences with a relScore >= 0.90 were considered to be real binding sites.