Topologically Associating Domains (TADs) are fundamental units of
chromatin organization that play crucial roles in gene regulation.
Multiple computational tools have been developed to predict TAD
boundaries from Hi-C data, but their results often vary significantly.
The generate_tad_consensus function provides a method to
integrate predictions from multiple tools and generate a high-confidence
consensus TAD set.
generate_tad_consensus creates consensus TADs through an
iterative threshold approach that selects optimal non-overlapping TADs
representing agreement across different prediction methods. It uses the
Measure of Concordance (MoC) score to quantify the level of agreement
between predictions from different tools.
df_tools: A data frame containing TAD information with the following required columns:
chr: Chromosome name
start: TAD start position
end: TAD end position
meta.tool: Identifier for the prediction
tool
threshold: A numeric value representing the minimum MoC threshold for filtering, default is 0. Higher thresholds require stronger agreement between different tools.
step: A numeric value used to generate the threshold sequence, default is -0.05. The function starts from 1 and decreases by this step value until reaching the threshold parameter.
The function returns a data frame with the following columns:
The following examples demonstrate how to use the
generate_tad_consensus function:
# Prepare input data with predictions from multiple tools
tad_data <- data.frame(
chr = rep("chr1", 6),
start = c(10000, 20000, 50000, 12000, 22000, 48000),
end = c(30000, 45000, 65000, 32000, 43000, 67000),
meta.tool = c(rep("tool1", 3), rep("tool2", 3))
)
# Generate consensus TADs with default parameters
consensus_results <- generate_tad_consensus(tad_data)
print(consensus_results)
#> # A tibble: 2 × 5
#> chr start end score_source threshold
#> <chr> <int> <int> <chr> <dbl>
#> 1 chr1 20000 45000 tool1_1; tool2_0.84000639974401 0.8
#> 2 chr1 48000 67000 tool1_0.789484763959792; tool2_1 0.75
# Generate consensus TADs with custom threshold values
custom_consensus <- generate_tad_consensus(
tad_data,
threshold = 0.3,
step = -0.1
)
print(custom_consensus)
#> # A tibble: 2 × 5
#> chr start end score_source threshold
#> <chr> <int> <int> <chr> <dbl>
#> 1 chr1 20000 45000 tool1_1; tool2_0.84000639974401 0.8
#> 2 chr1 48000 67000 tool1_0.789484763959792; tool2_1 0.7The generate_tad_consensus function follows these
steps:
Input validation: Check if the input contains data from multiple prediction tools. If only one tool is present, the function returns the original data.
Data preparation: Split the input data by chromosome.
Threshold sequence generation: Create a sequence of threshold values from 1 down to the specified threshold parameter using the step size.
Iterative TAD selection: For each chromosome,
apply the select_tads_by_threshold_series function,
which:
moc_score_filterselect_global_optimal_tads)
to select an optimal set of non-overlapping TADs that maximize the total
scoreThe MoC score quantifies the agreement between two TAD predictions and is calculated as:
\[MoC = \frac{(intersection\_width)^2}{width1 \times width2}\]
Where: - intersection_width is the length of the overlap
between two TADs - width1 and width2 are the
lengths of the two TADs being compared
Higher MoC scores indicate stronger agreement between predictions.
The algorithm uses dynamic programming to select a set of non-overlapping TADs that maximize the total MoC score. This ensures that the consensus TADs represent regions with the strongest evidence across multiple prediction tools while avoiding contradictory overlapping boundaries.
meta.tool column)sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_1.2.2 future_1.70.0 consensusTADs_0.99.1
#> [4] rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_2.0.0 dplyr_1.2.1 compiler_4.6.0
#> [4] tidyselect_1.2.1 stringr_1.6.0 GenomicRanges_1.65.0
#> [7] parallel_4.6.0 tidyr_1.3.2 jquerylib_0.1.4
#> [10] globals_0.19.1 IRanges_2.47.2 Seqinfo_1.3.0
#> [13] yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
#> [16] generics_0.1.4 knitr_1.51 BiocGenerics_0.59.7
#> [19] tibble_3.3.1 maketools_1.3.2 bslib_0.11.0
#> [22] pillar_1.11.1 rlang_1.2.0 utf8_1.2.6
#> [25] stringi_1.8.7 cachem_1.1.0 xfun_0.59
#> [28] sass_0.4.10 sys_3.4.3 otel_0.2.0
#> [31] cli_3.6.6 withr_3.0.3 magrittr_2.0.5
#> [34] digest_0.6.39 lifecycle_1.0.5 S4Vectors_0.51.3
#> [37] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1
#> [40] listenv_1.0.0 furrr_0.4.0 codetools_0.2-20
#> [43] buildtools_1.0.0 stats4_4.6.0 parallelly_1.47.0
#> [46] tools_4.6.0 pkgconfig_2.0.3 htmltools_0.5.9