epiwraps includes two ways of calculating normalization
factors: either from the signal files (e.g. bam or bigwig files), which
is the most robust way and enables all options, or from an
EnrichmentSE object (see the
multiRegionPlot vignette for an intro
to such an object) or signal matrices. In both cases, the logic is the
same: we estimate normalization factors (mostly single linear scaling
factors, although some methods involve more complex normalization), and
then apply them to signals that were extracted using
signal2Matrix().
It is possible to also directly use computed normalization factors
when creating bigwig files. By default, the
bam2bw() function scales using
library size, which can be disabled using scaling=FALSE.
However, it is also possible to pass the scaling argument a
manual scaling factor, as computed by the functions described here. In
this vignette, however, we will focus on normalizing signal
matrices.
The getNormFactors() function can be used to estimate
normalization factors from either bam or bigwig files. The files cannot
be mixed (bam/bigwig), however, and it is important to note that
normalization factors calculated on bam files cannot be applied to
data extracted from bigwig files, or vice versa, because the bigwig
files are by default already normalized for library size. If needed,
however, getNormFactors() can be used to apply the same
method to both kind of files.
Simple library size normalization, as done by bam2bw(),
is not always appropriate. The main reasons are 1) that different
samples/experiments can have a different signal-to-noise ratio, with the
result that more sequencing is needed to obtain a similar coverage of
enriched region; 2) that there might be global differences in the amount
of the signal of interest (e.g. more or less binding, globally, in one
cell type vs another); and 3) that there might be differences in
technical biases, such as GC content. For these reasons, different
normalization methods are needed according to circumstances and what
assumptions seem reasonable. Here is an overview of the normalization
methods currently implemented in epiwraps via the
getNormFactors() function:
The normalization factors can be computed using
getNormFactors() :
suppressPackageStartupMessages(library(epiwraps))
# we fetch the path to the example bigwig file:
bwf <- system.file("extdata/example_atac.bw", package="epiwraps")
# we'll just double it to create a fake multi-sample dataset:
bwfiles <- c(atac1=bwf, atac2=bwf)
nf <- getNormFactors(bwfiles, method="background")## Comparing coverage in random regions...
## atac1 atac2
## 1 1
In this case, since the files are identical, the factors are both 1.
Some normalization methods additionally require peaks as input, e.g.:
peaks <- system.file("extdata/example_peaks.bed", package="epiwraps")
nf <- getNormFactors(bwfiles, peaks = peaks, method="MAnorm")## Comparing coverage in peaks...
## calcNormFactors has been renamed to normLibSizes
## calcNormFactors has been renamed to normLibSizes
(Note that MAnorm would normally require to have a list of peaks for each sample/experiment).
Once computed, the normalization factors can be applied to an
EnrichmentSE object:
## Reading /tmp/Rtmp8aIpdv/Rinst1c2195e9ae1/epiwraps/extdata/example_atac.bw
## Reading /tmp/Rtmp8aIpdv/Rinst1c2195e9ae1/epiwraps/extdata/example_atac.bw
## class: EnrichmentSE
## 2 tracks across 150 regions
## assays(2): normalized input
## rownames(150): 1:195054101-195054250 1:133522798-133523047 ...
## 1:22224734-22224983 1:90375438-90375787
## rowData names(0):
## colnames(2): atac1 atac2
## colData names(0):
## metadata(0):
The object now has a new assay, called normalized, which
has been put in front and therefore will be used for most downstream
usages unless the uses specifies otherwise. Note that for any downstream
function it is however possible to specify which assay to use via the
assay argument.
It is also possible to normalize the signal matrices using factors
derived from the matrices themselves, using the
renormalizeSignalMatrices function. Note that this is
provided as a ‘quick-and-dirty’ approach that does not have the
robustness of proper estimation methods. Specifically, beyond providing
manual scaling factors (e.g. computed using
getNormFactors), the function includes two methods :
method="border" works on the assumption that the
left/right borders of the matrices represent background signal which
should be equal across samples. As such, it can be seen as an
approximation of the aforementioned background normalization. However,
it will work only if 1) the left/right borders of the matrices are
sufficiently far from the signal (e.g. peaks) to be chiefly noise, and
(as with the main background normalization method itself) 2) the
signal-to-noise ratio is comparable across tracks/samples.method="top" instead works on the assumption that the
highest signal (after some eventual trimming of the extremes) should be
the same across tracks/samples.To illustrate these, we will first introduce some difference between our two tracks using arbitrary factors:
sm <- renormalizeSignalMatrices(sm, scaleFactors=c(1,4), toAssay="test")
plotEnrichedHeatmaps(sm, assay = "test")Then we can normalize:
sm <- renormalizeSignalMatrices(sm, method="top", fromAssay="test")
# again this adds an assay to the object, which will be automatically used when plotting:
plotEnrichedHeatmaps(sm)## Using assay topNormalized
And we’ve recovered comparable signal across the two tracks/samples.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggplot2_4.0.3 epiwraps_0.99.120
## [3] EnrichedHeatmap_1.43.0 ComplexHeatmap_2.29.0
## [5] SummarizedExperiment_1.43.0 Biobase_2.73.1
## [7] GenomicRanges_1.65.0 Seqinfo_1.3.0
## [9] IRanges_2.47.2 S4Vectors_0.51.3
## [11] BiocGenerics_0.59.7 generics_0.1.4
## [13] MatrixGenerics_1.25.0 matrixStats_1.5.0
## [15] BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.19.0 sys_3.4.3
## [4] jsonlite_2.0.0 shape_1.4.6.1 magrittr_2.0.5
## [7] magick_2.9.1 GenomicFeatures_1.65.0 farver_2.1.2
## [10] rmarkdown_2.31 GlobalOptions_0.1.4 BiocIO_1.23.3
## [13] vctrs_0.7.3 memoise_2.0.1 Rsamtools_2.29.0
## [16] RCurl_1.98-1.19 base64enc_0.1-6 htmltools_0.5.9
## [19] S4Arrays_1.13.0 BiocBaseUtils_1.15.1 progress_1.2.3
## [22] curl_7.1.0 SparseArray_1.13.2 Formula_1.2-5
## [25] sass_0.4.10 bslib_0.11.0 htmlwidgets_1.6.4
## [28] Gviz_1.57.0 httr2_1.2.2 cachem_1.1.0
## [31] buildtools_1.0.0 GenomicAlignments_1.49.0 lifecycle_1.0.5
## [34] iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.7-5
## [37] R6_2.6.1 fastmap_1.2.0 clue_0.3-68
## [40] digest_0.6.39 colorspace_2.1-2 patchwork_1.3.2
## [43] AnnotationDbi_1.75.0 Hmisc_5.2-5 RSQLite_3.53.1
## [46] labeling_0.4.3 filelock_1.0.3 httr_1.4.8
## [49] abind_1.4-8 compiler_4.6.0 withr_3.0.2
## [52] bit64_4.8.2 doParallel_1.0.17 backports_1.5.1
## [55] htmlTable_2.5.0 S7_0.2.2 BiocParallel_1.47.0
## [58] DBI_1.3.0 biomaRt_2.69.0 rappdirs_0.3.4
## [61] DelayedArray_0.39.3 rjson_0.2.23 tools_4.6.0
## [64] foreign_0.8-91 otel_0.2.0 nnet_7.3-20
## [67] glue_1.8.1 restfulr_0.0.17 checkmate_2.3.4
## [70] cluster_2.1.8.2 gtable_0.3.6 BSgenome_1.81.0
## [73] ensembldb_2.37.3 data.table_1.18.4 hms_1.1.4
## [76] XVector_0.53.0 foreach_1.5.2 pillar_1.11.1
## [79] stringr_1.6.0 limma_3.69.2 circlize_0.4.18
## [82] dplyr_1.2.1 BiocFileCache_3.3.0 lattice_0.22-9
## [85] deldir_2.0-4 rtracklayer_1.73.0 bit_4.6.0
## [88] biovizBase_1.61.0 tidyselect_1.2.1 locfit_1.5-9.12
## [91] pbapply_1.7-4 maketools_1.3.2 Biostrings_2.81.3
## [94] knitr_1.51 gridExtra_2.3 ProtGenerics_1.45.0
## [97] edgeR_4.11.1 xfun_0.58 statmod_1.5.2
## [100] stringi_1.8.7 UCSC.utils_1.9.0 lazyeval_0.2.3
## [103] yaml_2.3.12 evaluate_1.0.5 codetools_0.2-20
## [106] cigarillo_1.3.0 interp_1.1-6 GenomicFiles_1.49.0
## [109] tibble_3.3.1 BiocManager_1.30.27 cli_3.6.6
## [112] rpart_4.1.27 jquerylib_0.1.4 dichromat_2.0-0.1
## [115] Rcpp_1.1.1-1.1 GenomeInfoDb_1.49.1 dbplyr_2.5.2
## [118] png_0.1-9 XML_3.99-0.23 parallel_4.6.0
## [121] blob_1.3.0 prettyunits_1.2.0 jpeg_0.1-11
## [124] latticeExtra_0.6-31 AnnotationFilter_1.37.0 bitops_1.0-9
## [127] viridisLite_0.4.3 VariantAnnotation_1.59.0 scales_1.4.0
## [130] crayon_1.5.3 GetoptLong_1.1.1 rlang_1.2.0
## [133] KEGGREST_1.53.0