GPlinksR constructs gene-peak regulatory links by
combining three sources of evidence: enhancer-based links, promoter
overlaps, and nearest-gene mappings. The package is designed for
ATAC-RNA integration workflows where users already have a set of peaks
and a set of genes, but it also includes a wrapper that can extract
these inputs from common Bioconductor containers such as
SingleCellExperiment (Lun and Risso
2024) and MultiAssayExperiment (Ramos et al. 2024).
This vignette uses a small built-in example dataset to demonstrate how to prepare inputs, how to run the main function, and how to use the wrapper when your data are already stored in Bioconductor containers.
If you are running these chunks manually from the package source
directory rather than knitting the vignette, load the current source
package first with devtools::load_all(".") so that you use
the local development version rather than an older installed copy.
The core function in GPlinksR is
build_gp_links(). It expects:
pk: peak coordinates in "chr:start-end"
format, or a data.frame with peak coordinate columns.gn: a character vector of gene symbols.For users working with single-cell analysis objects,
build_gp_links_wrapper() can extract these inputs
automatically and pass them to build_gp_links().
We begin with the package example dataset, which contains 300 peak coordinates and 100 gene symbols prepared for demonstration purposes.
data("gp_example_inputs", package = "GPlinksR")
data("gp_example_links", package = "GPlinksR")
pk <- gp_example_inputs$pk
gn <- gp_example_inputs$gn
length(pk)## [1] 300
## [1] 100
For the executable examples below, we use a small deterministic subset of the packaged data that is arranged to yield enhancer-, promoter-, and closest-based links.
## [1] "chr1:819770-822338" "chr1:876277-877834" "chr1:923554-925121"
## [1] "TTLL10" "FAM87B" "SAMD11" "FAM41C"
At this point, the main function can be called directly:
For the packaged example subset, the resulting link table begins as follows:
## Peak Gene Src
## 1 chr1:819770-822338 TTLL10 enh
## 2 chr1:876277-877834 SAMD11 enh
## 3 chr1:876277-877834 FAM41C prom
## 4 chr1:923554-925121 SAMD11 prom
## 5 chr1:819770-822338 FAM87B clo
## 6 chr1:876277-877834 FAM41C clo
If you want to use a previously cached PEREGRINE file, you may first download or retrieve it with:
Some users store peaks as separate chromosome, start, and end columns. This is also supported.
peak_parts <- do.call(rbind, strsplit(pk_demo, "[:-]"))
peak_df <- data.frame(
chr = peak_parts[, 1],
start = as.integer(peak_parts[, 2]),
end = as.integer(peak_parts[, 3])
)
peak_df## chr start end
## 1 chr1 819770 822338
## 2 chr1 876277 877834
## 3 chr1 923554 925121
This can be supplied directly to build_gp_links():
This data-frame representation yields the same example link table:
## Peak Gene Src
## 1 chr1:819770-822338 TTLL10 enh
## 2 chr1:876277-877834 SAMD11 enh
## 3 chr1:876277-877834 FAM41C prom
## 4 chr1:923554-925121 SAMD11 prom
## 5 chr1:819770-822338 FAM87B clo
## 6 chr1:876277-877834 FAM41C clo
For many Bioconductor workflows, peaks and genes are already stored
in a MultiAssayExperiment. In that setting, the wrapper
lets users specify which experiment contains peaks and which contains
genes, and then handles the input extraction automatically.
The following code shows a simple simulated setup. Here, the RNA
experiment is represented by a SummarizedExperiment (Morgan et al. 2024), while the ATAC experiment
stores peak ranges.
if (requireNamespace("MultiAssayExperiment", quietly = TRUE) &&
requireNamespace("SummarizedExperiment", quietly = TRUE) &&
requireNamespace("IRanges", quietly = TRUE)) {
library(MultiAssayExperiment)
library(SummarizedExperiment)
library(GenomicRanges)
library(IRanges)
library(S4Vectors)
sample_ids <- paste0("cell", seq_len(3))
peak_parts_mae <- do.call(
rbind,
strsplit(pk_demo[seq_len(3)], "[:-]")
)
atac_counts <- matrix(sample.int(10, 9, TRUE), nrow = 3)
colnames(atac_counts) <- sample_ids
atac_se <- SummarizedExperiment(
assays = list(counts = atac_counts),
rowRanges = GRanges(
seqnames = peak_parts_mae[, 1],
ranges = IRanges(
start = as.integer(peak_parts_mae[, 2]),
end = as.integer(peak_parts_mae[, 3])
)
)
)
rna_counts <- matrix(sample.int(10, 12, TRUE), nrow = 4)
colnames(rna_counts) <- sample_ids
rna_se <- SummarizedExperiment(
assays = list(counts = rna_counts),
rowData = DataFrame(
symbol = gn_demo[seq_len(4)]
)
)
mae <- MultiAssayExperiment(
experiments = list(ATAC = atac_se, RNA = rna_se),
colData = S4Vectors::DataFrame(row.names = sample_ids)
)
}Once the object has been created, the wrapper call is concise:
gp <- build_gp_links_wrapper(
x = mae,
peak_experiment = "ATAC",
gene_experiment = "RNA",
gene_col = "symbol"
)
head(gp)For this packaged example setup, the wrapper returns the same link table:
## Peak Gene Src
## 1 chr1:819770-822338 TTLL10 enh
## 2 chr1:876277-877834 SAMD11 enh
## 3 chr1:876277-877834 FAM41C prom
## 4 chr1:923554-925121 SAMD11 prom
## 5 chr1:819770-822338 FAM87B clo
## 6 chr1:876277-877834 FAM41C clo
The wrapper also supports a SingleCellExperiment (Lun and Risso 2024) workflow in which
gene-level measurements are stored in the main object and peak features
are stored in an altExp().
if (requireNamespace("SingleCellExperiment", quietly = TRUE) &&
requireNamespace("SummarizedExperiment", quietly = TRUE)) {
library(SingleCellExperiment)
library(SummarizedExperiment)
library(S4Vectors)
sample_ids <- paste0("cell", seq_len(4))
rna_counts <- matrix(sample.int(10, 16, TRUE), nrow = 4)
rownames(rna_counts) <- gn_demo[seq_len(4)]
colnames(rna_counts) <- sample_ids
peak_counts <- matrix(sample.int(10, 12, TRUE), nrow = 3)
colnames(peak_counts) <- sample_ids
peak_rowdata <- DataFrame(
PeakRegion = pk_demo[seq_len(3)]
)
sce <- SingleCellExperiment(
assays = list(counts = rna_counts),
rowData = DataFrame(symbol = rownames(rna_counts))
)
altExp(sce, "ATAC") <- SummarizedExperiment(
assays = list(counts = peak_counts),
rowData = peak_rowdata
)
}The wrapper then extracts genes from the main object and peaks from the named alternative experiment:
For this packaged example setup, the
SingleCellExperiment wrapper also returns the same example
link table:
## Peak Gene Src
## 1 chr1:819770-822338 TTLL10 enh
## 2 chr1:876277-877834 SAMD11 enh
## 3 chr1:876277-877834 FAM41C prom
## 4 chr1:923554-925121 SAMD11 prom
## 5 chr1:819770-822338 FAM87B clo
## 6 chr1:876277-877834 FAM41C clo
The wrapper is intended to reduce the amount of manual preprocessing required by new users, but a few object conventions are still important:
"chr:start-end" format.MultiAssayExperiment input, peak and gene
experiments should be named explicitly.SingleCellExperiment input, the main object is
assumed to contain genes and altExp() is assumed to contain
peaks.If your object uses different column names, you can point the wrapper
to them with peak_col and gene_col.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.35.1 MultiAssayExperiment_1.39.0
## [3] SummarizedExperiment_1.43.0 Biobase_2.73.1
## [5] GenomicRanges_1.65.0 Seqinfo_1.3.0
## [7] IRanges_2.47.2 S4Vectors_0.51.3
## [9] BiocGenerics_0.59.7 generics_0.1.4
## [11] MatrixGenerics_1.25.0 matrixStats_1.5.0
## [13] GPlinksR_0.99.0 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-5 jsonlite_2.0.0 compiler_4.6.0
## [4] BiocManager_1.30.27 jquerylib_0.1.4 yaml_2.3.12
## [7] fastmap_1.2.0 lattice_0.22-9 XVector_0.53.0
## [10] R6_2.6.1 S4Arrays_1.13.0 knitr_1.51
## [13] DelayedArray_0.39.3 maketools_1.3.2 bslib_0.11.0
## [16] rlang_1.2.0 cachem_1.1.0 xfun_0.59
## [19] sass_0.4.10 sys_3.4.3 otel_0.2.0
## [22] SparseArray_1.13.2 cli_3.6.6 digest_0.6.39
## [25] grid_4.6.0 lifecycle_1.0.5 evaluate_1.0.5
## [28] buildtools_1.0.0 abind_1.4-8 rmarkdown_2.31
## [31] tools_4.6.0 htmltools_0.5.9