Generating Consensus TADs with generate_tad_consensus

Introduction

Topologically Associating Domains (TADs) are fundamental units of chromatin organization that play crucial roles in gene regulation. Multiple computational tools have been developed to predict TAD boundaries from Hi-C data, but their results often vary significantly. The generate_tad_consensus function provides a method to integrate predictions from multiple tools and generate a high-confidence consensus TAD set.

Function Overview

generate_tad_consensus creates consensus TADs through an iterative threshold approach that selects optimal non-overlapping TADs representing agreement across different prediction methods. It uses the Measure of Concordance (MoC) score to quantify the level of agreement between predictions from different tools.

Parameters

generate_tad_consensus(
  df_tools,
  threshold = 0,
  step = -0.05
)
  • df_tools: A data frame containing TAD information with the following required columns:

  • chr: Chromosome name

  • start: TAD start position

  • end: TAD end position

  • meta.tool: Identifier for the prediction tool

  • threshold: A numeric value representing the minimum MoC threshold for filtering, default is 0. Higher thresholds require stronger agreement between different tools.

  • step: A numeric value used to generate the threshold sequence, default is -0.05. The function starts from 1 and decreases by this step value until reaching the threshold parameter.

Return Value

The function returns a data frame with the following columns:

  • chr: Chromosome name
  • start: TAD start position
  • end: TAD end position
  • score_source: A string containing information about the tools that contributed to this TAD and their individual MoC scores
  • threshold: The MoC threshold value at which this TAD was selected during the iterative selection process

Usage Examples

The following examples demonstrate how to use the generate_tad_consensus function:

# Prepare input data with predictions from multiple tools
tad_data <- data.frame(
  chr = rep("chr1", 6),
  start = c(10000, 20000, 50000, 12000, 22000, 48000),
  end = c(30000, 45000, 65000, 32000, 43000, 67000),
  meta.tool = c(rep("tool1", 3), rep("tool2", 3))
)

# Generate consensus TADs with default parameters
consensus_results <- generate_tad_consensus(tad_data)
print(consensus_results)
#> # A tibble: 2 × 5
#>   chr   start   end score_source                     threshold
#>   <chr> <int> <int> <chr>                                <dbl>
#> 1 chr1  20000 45000 tool1_1; tool2_0.84000639974401       0.8 
#> 2 chr1  48000 67000 tool1_0.789484763959792; tool2_1      0.75

# Generate consensus TADs with custom threshold values
custom_consensus <- generate_tad_consensus(
  tad_data,
  threshold = 0.3,
  step = -0.1
)
print(custom_consensus)
#> # A tibble: 2 × 5
#>   chr   start   end score_source                     threshold
#>   <chr> <int> <int> <chr>                                <dbl>
#> 1 chr1  20000 45000 tool1_1; tool2_0.84000639974401        0.8
#> 2 chr1  48000 67000 tool1_0.789484763959792; tool2_1       0.7

How It Works

The generate_tad_consensus function follows these steps:

  1. Input validation: Check if the input contains data from multiple prediction tools. If only one tool is present, the function returns the original data.

  2. Data preparation: Split the input data by chromosome.

  3. Threshold sequence generation: Create a sequence of threshold values from 1 down to the specified threshold parameter using the step size.

  4. Iterative TAD selection: For each chromosome, apply the select_tads_by_threshold_series function, which:

  • Iterates through the threshold sequence from high to low
  • For each threshold, calculates MoC scores between TADs using moc_score_filter
  • Filters TADs that meet the current threshold
  • Uses dynamic programming (select_global_optimal_tads) to select an optimal set of non-overlapping TADs that maximize the total score
  • Records the threshold at which each TAD was selected
  1. Result compilation: Combine results from all chromosomes and return a data frame with the consensus TADs.

The Measure of Concordance (MoC) Score

The MoC score quantifies the agreement between two TAD predictions and is calculated as:

\[MoC = \frac{(intersection\_width)^2}{width1 \times width2}\]

Where: - intersection_width is the length of the overlap between two TADs - width1 and width2 are the lengths of the two TADs being compared

Higher MoC scores indicate stronger agreement between predictions.

Dynamic Programming for Optimal TAD Selection

The algorithm uses dynamic programming to select a set of non-overlapping TADs that maximize the total MoC score. This ensures that the consensus TADs represent regions with the strongest evidence across multiple prediction tools while avoiding contradictory overlapping boundaries.

Important Notes

  • Input data must contain predictions from at least two different tools (identified by the meta.tool column)
  • The threshold parameter defines the minimum required MoC score and can be adjusted based on analysis needs
  • The returned consensus TADs are guaranteed to be non-overlapping
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] purrr_1.2.2          future_1.70.0        consensusTADs_0.99.1
#> [4] rmarkdown_2.31      
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_2.0.0       dplyr_1.2.1          compiler_4.6.0      
#>  [4] tidyselect_1.2.1     stringr_1.6.0        GenomicRanges_1.65.0
#>  [7] parallel_4.6.0       tidyr_1.3.2          jquerylib_0.1.4     
#> [10] globals_0.19.1       IRanges_2.47.2       Seqinfo_1.3.0       
#> [13] yaml_2.3.12          fastmap_1.2.0        R6_2.6.1            
#> [16] generics_0.1.4       knitr_1.51           BiocGenerics_0.59.7 
#> [19] tibble_3.3.1         maketools_1.3.2      bslib_0.11.0        
#> [22] pillar_1.11.1        rlang_1.2.0          utf8_1.2.6          
#> [25] stringi_1.8.7        cachem_1.1.0         xfun_0.59           
#> [28] sass_0.4.10          sys_3.4.3            otel_0.2.0          
#> [31] cli_3.6.6            withr_3.0.3          magrittr_2.0.5      
#> [34] digest_0.6.39        lifecycle_1.0.5      S4Vectors_0.51.3    
#> [37] vctrs_0.7.3          evaluate_1.0.5       glue_1.8.1          
#> [40] listenv_1.0.0        furrr_0.4.0          codetools_0.2-20    
#> [43] buildtools_1.0.0     stats4_4.6.0         parallelly_1.47.0   
#> [46] tools_4.6.0          pkgconfig_2.0.3      htmltools_0.5.9