---
title: "Miscellaneous epiwraps functions"
author: 
- name: Pierre-Luc Germain
  affiliation:
    - "Lab of Statistical Bioinformatics, University of Zürich; "
    - "D-HEST Institute for Neuroscience, ETH Zürich, Switzerland"
package: epiwraps
output:
  BiocStyle::html_document
abstract: |
  This vignette introduces some more isolated epiwraps functions.
vignette: |
  %\VignetteIndexEntry{misc}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r prelim, include=FALSE}
library(BiocStyle)
```

## Getting read counts in regions of interest

The `peakCountsFromBAM` and `peakCountsFromFrags` functions enable the 
construction of SummarizedExperiment objects containing overlap counts of 
regions across samples. `peakCountsFromFrags` is especially geared towards
single-cell data, splitting the frag file by cell barcodes, 
generating sparse representations and optionally doing pseudo-bulk aggregation
on the fly.

The functions can also compute fragment length information for each region, 
which can for instance be used with the `betterChromVAR` package. For example:

```{r peakCountsFromBAM,  message=FALSE}
suppressPackageStartupMessages(library(epiwraps))
bam <- system.file("extdata", "ex1.bam", package="Rsamtools")
# create regions of interest
peaks <- GRanges(c("seq1","seq1","seq2"), IRanges(c(400,900,500), width=100))
peakCountsFromBAM(bam, peaks, paired=FALSE, getMedianFragLength=TRUE)
```

## Quality control

### Coverage statistics

Coverage statistics give an overview of how the reads are distributed across 
the genome (or more precisely, across a large number of random regions). The 
`getCovStats` will compute such statistics from bam or bigwig files (from 
bigwig files will be considerably faster, but if the files are normalized 
the coverage/density will be relative).

Because our example data spans only part of a chromosome, we'll exclude 
completely empty regions using the `exclude` parameter, which would normally
be used to exclude regions likely to be technical artefacts (e.g. blacklisted
regions). 

```{r covstats, message=FALSE}
# get the path to an example bigwig file:
bwf <- system.file("extdata/example_atac.bw", package="epiwraps")
cs <- getCovStats(bwf, exclude=GRanges("1", IRanges(1, 4300000)))
plotCovStats(cs)
```

Panel A shows the proportion of sampled regions which are above a certain read 
density (relative because this is a normalized bigwig file, would be coverage 
otherwise). This shows us, for example, that as expected only a minority of 
regions have any reads at all (indicating that the reads are not randomly 
distributed).
Panel B is what is sometimes called a fingerprint plot. It similarly shows us 
that the reads are concentrated in very few regions, since the vast majority of
regions have only a very low fraction of the coverage of a few high-density 
regions.
Randomly distributed reads would go along the diagonal, but one normally has a 
curve somewhere between this line and the lower-right corner -- the farther 
away from the diagonal, to more strongly enriched the data is.

This can be done for multiple files simultaneously. If we have several files, 
we can also use the coverage in the random windows to computer their similarity 
(see `?plotCorFromCovStats`).

### Fragment length distributions

Given one or more paired-end bam files, we can extract and plot the fragment 
length distribution using:

```{r fragSize, eval=FALSE}
fragSizesDist(bam, what=100)
```

### TSS enrichment

The TSS enrichment can also be calculated and plotted using the `TSSenrichment`
function.

## Peak calling

A peak calling function can be used, either against an input control or 
against local or global backgrounds:

```{r callPeaks, eval=FALSE}
p <- callPeaks(bam)
```

Note that the peak caller was developed chiefly to facilitate teaching, and we
do not guarantee its performance.

## Region merging

The merging of genomic regions with `reduce` tends to produce large regions
which are undesirable for some downstream analyses. As an alternative, the 
`reduceWithResplit` function splits large merges using local overlap minima.

## Region overlapping

The `GenomicRanges` package offers fast and powerful functions for overlapping
genomic regions. `epiwraps` includes wrappers around those for common tasks, 
such as calculating and visualizing overlaps across multiple sets of regions 
(see `?regionsToUpset`, `?regionOverlaps`, and `?regionCAT`).

<br/><br/>

## Session information {-}

```{r sessionInfo}
sessionInfo()
```