tTEscanR is a powerful, versatile and user-friendly R package designed to quantify and analyze the relationship between codon usage in mRNA and the availability of corresponding anticodons in tRNA. The package computes a theoretical translation efficiency (tTE) score as a proxy of translation elongation efficiency, hereafter referred to as translation efficiency.
In this document, we present a case example to demonstrate the potential of tTEscanR.
tTEscanR features a modular structure that enables running specific components independently or as part of a comprehensive pipeline. This design provides flexibility to enhance and complement the analysis of codon-anticodon dynamics across diverse biological contexts.
tTEscanR supports both gene expression and chromatin accessibility profiling data. The accepted mRNA and tRNA inputs consist of pre-processed gene expression count matrices, where features (e.g. genes or transcripts) are organized as rows and conditions (e.g. samples, replicates, or individual cells) as columns. The package is optimized for bulk and single-cell datasets. The datasets should be loaded according to their respective data files formats.
In this tutorial, we will analyze a single-cell fetal human atlas described in (Cao et al. 2020) and (Domcke et al. 2020), and previously examined by (Gao et al. 2022). A this data and a subset of it are included in tTEscanR and can be directly loaded.
Dimensions: 9900 genes (rows) x 172 cell types (columns)
Rows: The genes are expressed in the gene name format (e.g. GATSL1)
Columns: The cell type labels are composed of two parts: tissue - cell type (e.g. Adrenal-Adrenocortical cells)
Dimensions: 377 tRNA genes (rows) x 89 cell types (columns)
Rows: The tRNA genes labels are: tRNA - Amino acid - Anticodon - Identifier number (e.g. tRNA-Asn-GTT-5-1)
Columns: The cell types labels have the same format as described for the mRNA data
The pre-processing module formats and standardizes input matrices to ensure they are structured correctly for reliable analysis through the pipeline.
The tRNACutsFilter() function filters
out samples or conditions with low total tRNA
expression, helping to ensure overall data quality.
The tTEscanR object is a centralized data structure that stores input matrices, metadata, and results, continuously updated to ensure consistency across the pipeline. In order to ensure robustness throughout the pipeline specific ids have been assigned and should be respected by the user (see the documentation for more details).
The createObject() function initializes
a new tTEscanR object to store and organize analysis
data. The input can be either a single matrix or a
list of matrices (to support multiple datasets), and
may optionally include metadata. For proper
functionality, all input matrices must be appropriately named. To
modify, extend, or update an existing tTEscanR object
with new data or metadata, use
updateObject().
# Adding the mRNA and tRNA datasest to the object
tTEobject <- createObject(
counts = list(default_tTEscanR_mRNA_data, filtered_tRNA_data),
assay = list("mRNA", "tRNA"),
meta.data = default_tTEscanR_metadata, meta.data.ids = "ConditionsLabels"
)# Updating the object created before some metadata for reference
matching_celltypes <- intersect(
colnames(default_tTEscanR_mRNA_data), colnames(filtered_tRNA_data)
)
tTEobject <- updateObject(
object = tTEobject, meta.data = matching_celltypes,
meta.data.ids = "matching_celltypes", overwrite = TRUE
)Each component of a tTEscanR object can be accessed
using the getAssays() or getMetadata()
functions that requires the object and the name of the slot that wants
to be retrieved.
The analysis can be carried out across three hierarchical layers of information: gene expression, codon and anticodon pool, and amino acid level. This multi-layered approach provides a comprehensive view of translation efficiency.
Codon usage is computed by performing a matrix
multiplication between the mRNA expression data and a
codon frequency-per-gene reference matrix. This
reference matrix can be generated using
obtainCodonComposition() or alternatively,
a user-defined codon frequency matrix can be supplied
directly, providing flexibility for custom analyses.
The reference codon frequency-per-gene matrix represents the codon distribution of each protein-coding gene in a reference genome.
For more details, please refer to the dedicated codon frequency vignette.
The computeCodonUsage() function
calculates codon usage by multiplying an mRNA
expression matrix with a codon frequency-per-gene table. The resulting
matrix contains codons as rows and samples or conditions as columns.
The codon frequency table can either be: (i) provided directly
(e.g. computed previously using
obtainCodonComposition()), or (ii) loaded
from the built-in defaults available for human and mouse.
In addition to generating the codon usage matrix,
computeCodonUsage() can optionally compute
the following:
Codon exonic background: genome-wide codon composition calculated across all genes.
Mean codon usage: average codon usage across all conditions or samples.
Exonic background and mean usage correlation: metric used to assess bias in codon usage relative to the underlying genomic codon composition.
# We first need to add the correction factor to the tTEscanR object
# It has to be stored as CorrectionFactor
tTEobject <- updateObject(
object = tTEobject, meta.data = "tissue", meta.data.ids = "CorrectionFactor"
)
tTEobject <- computeCodonUsage(
object = tTEobject, codon_freq = NULL, species = "hg38",
additional_metrics = TRUE, overwrite = TRUE
)# Transforming the data
additional_metrics <- getMetadata(tTEobject, "CodonUsage_AdditionalMetrics")
mean_codon_usage <- additional_metrics$MeanCodonUsage
exonic_background <- additional_metrics$CodonExonicBackground
exonic_background <- as.data.frame(exonic_background)
correlation_mean_background <- cbind(mean_codon_usage, exonic_background)
plotCorrelation(
data = correlation_mean_background, plot = "MeanCodonUsage",
x_axis_col = "mean_usage_across_conditions",
y_axis_col = "exonic_background",
extra_val = additional_metrics$MeanCodonCorr,
condition_col = "feature", # Here feature = codons
add_titles = TRUE, show_legend = "none"
)You can further evaluate the codon usage output using
showPoolContribution(), which quantifies
the contribution of the most highly expressed genes to the overall codon
pool across different conditions. This analysis helps identify whether
codon usage is dominated by a small subset of highly expressed
transcripts or is broadly distributed across the transcriptome.
# Transforming the data
codon_pool_contr <- getMetadata(tTEobject, "CodonPoolContribution_Results")
codon_pool_diversity <- codon_pool_contr$top10GenesCodonPoolDiversity
colnames(codon_pool_diversity) <- c(
"condition", "original_top_contribution", "baseline_correlation"
)
codon_pool_diversity <- codon_pool_diversity %>%
tidyr::separate(
.data$condition,
into = c("tissue", "cell_type"), sep = "-"
)
plotCorrelation(
data = codon_pool_diversity, plot = "PoolDiversity",
x_axis_col = "original_top_contribution",
y_axis_col = "baseline_correlation",
condition_col = "tissue", label_col = "cell_type", show_legend = "right"
)The outputs generated during the execution of tTEscanR can be transformed into comprehensive visualizations to support data interpretation and exploration. A variety of plotting functions are available in tTEscanR to represent codon usage patterns, gene contribution, and other key metrics.
For more details, please refer to the dedicated visualization vignette.
The computeAnticodonUsage() function
calculates anticodon usage by aggregating tRNA
expression data at the anticodon level. Analogous to
computeCodonUsage(), the resulting matrix
contains anticodons as rows and samples or conditions as columns.
The computeAAUsage() function computes
amino acid demand and supply by
integrating codon and anticodon usage data, respectively. Users can
choose to calculate demand and supply either separately or together.
The computeTheoreticalTE() function
calculates the Theoretical Translation
Efficiency (tTE) by measuring the correlation between: (i)
codon usage and anticodon availability, or (ii) amino acid demand and
amino acid supply. Users can compute these correlations separately or in
combination. To ensure accurate correlation between these data sources,
it is crucial that the mRNA and tRNA datasets share matching conditions
(i.e. identical column names representing the same samples or
groups).
# Computing tTE at the codon-anticodon level
tTEobject <- computeTheoreticalTE(object = tTEobject, level = "codon")# Computing tTE at the AA demand-supply level
tTEobject <- computeTheoreticalTE(object = tTEobject, level = "aa")# Computing simultaneously tTE at codon-anticodon and AA demand-supply levels
tTEobject <- computeTheoreticalTE(
object = tTEobject, level = "both", overwrite = TRUE
)
#> | | | 0% | |== | 2% | |=== | 5% | |===== | 7% | |======= | 10% | |========= | 12% | |========== | 15% | |============ | 17% | |============== | 20% | |=============== | 22% | |================= | 24% | |=================== | 27% | |==================== | 29% | |====================== | 32% | |======================== | 34% | |========================== | 37% | |=========================== | 39% | |============================= | 41% | |=============================== | 44% | |================================ | 46% | |================================== | 49% | |==================================== | 51% | |====================================== | 54% | |======================================= | 56% | |========================================= | 59% | |=========================================== | 61% | |============================================ | 63% | |============================================== | 66% | |================================================ | 68% | |================================================== | 71% | |=================================================== | 73% | |===================================================== | 76% | |======================================================= | 78% | |======================================================== | 80% | |========================================================== | 83% | |============================================================ | 85% | |============================================================= | 88% | |=============================================================== | 90% | |================================================================= | 93% | |=================================================================== | 95% | |==================================================================== | 98% | |======================================================================| 100%
#> | | | 0% | |== | 2% | |=== | 5% | |===== | 7% | |======= | 10% | |========= | 12% | |========== | 15% | |============ | 17% | |============== | 20% | |=============== | 22% | |================= | 24% | |=================== | 27% | |==================== | 29% | |====================== | 32% | |======================== | 34% | |========================== | 37% | |=========================== | 39% | |============================= | 41% | |=============================== | 44% | |================================ | 46% | |================================== | 49% | |==================================== | 51% | |====================================== | 54% | |======================================= | 56% | |========================================= | 59% | |=========================================== | 61% | |============================================ | 63% | |============================================== | 66% | |================================================ | 68% | |================================================== | 71% | |=================================================== | 73% | |===================================================== | 76% | |======================================================= | 78% | |======================================================== | 80% | |========================================================== | 83% | |============================================================ | 85% | |============================================================= | 88% | |=============================================================== | 90% | |================================================================= | 93% | |=================================================================== | 95% | |==================================================================== | 98% | |======================================================================| 100%tTEresults_codon <- getMetadata(tTEobject, "tTEresults_codon")
tTEresults_AA <- getMetadata(tTEobject, "tTEresults_AA")
plotTEscore(
data = tTEresults_codon, metadata = conditions_metadata,
index_col = "conditions", class_col = "tissue", add_stats = FALSE
)
#> Warning in defineMergedData(data = data, meta = metadata, index = index_col, :
#> One or more groups have fewer than 2 samples. Statistics (p-values) will return
#> NA.
#> $plot
#> Warning: Groups with fewer than two datapoints have been dropped.
#> ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
#> Warning: Groups with fewer than two datapoints have been dropped.
#> ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.#>
#> $stats
#> NULL
plotTEscore(
data = tTEresults_AA, metadata = conditions_metadata,
index_col = "conditions", class_col = "tissue", add_stats = FALSE
)
#> Warning in defineMergedData(data = data, meta = metadata, index = index_col, :
#> One or more groups have fewer than 2 samples. Statistics (p-values) will return
#> NA.
#> $plot
#> Warning: Groups with fewer than two datapoints have been dropped.
#> ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
#> Groups with fewer than two datapoints have been dropped.
#> ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
#>
#> $stats
#> NULL
For visualization purposes, a set of target conditions (e.g. a specific group of cells) can be defined, allowing comparison of their tTE scores against those of all other conditions in the dataset. In this example, we focus on neurons as the target group but exclude the ENS neurons from the selection to refine the analysis.
conditions_metadata$group <- "other"
conditions_metadata$group[grep(
"neuron", conditions_metadata$conditions
)] <- "neurons"
conditions_metadata$group[grep(
"ENS neuron", conditions_metadata$conditions
)] <- "other"# Use tTEresults_codon to assess the codon-anticodon level
plotTEscore(
data = tTEresults_AA, metadata = conditions_metadata,
index_col = "conditions", class_col = "group", add_stats = TRUE
)
#> $plot#>
#> $stats
#> group1 group2 p_value comparison p_signif class
#> 1 neurons other 0.2705441 neurons_vs_other ns other
The runDEAnalysis() function performs
differential expression analysis with DESeq2 and generates multiple
plots to display the results. When datasets share the same
conditions and name_sep settings, they can be
processed together in a single run. The input to this function must be a
list of matrices.
# Other outputs that could be analyzed:
# mRNA <- getAssay(tTEobject, "mRNA")
# CodonUsage <- getAssay(tTEobject, "CodonUsage")
# tRNA <- getAssay(tTEobject, "tRNA")
# AnticodonUsage <- getAssay(tTEobject, "AnticodonUsage")
AA_results <- list(
AADemand = getAssay(tTEobject, "AADemand"),
AASupply = getAssay(tTEobject, "AASupply")
)The outputs of the runDEAnalysis()
function vary depending on the parameters enabled. In this example, the
results include: (i) a heatmap, (ii) PCA plots (based on the selected
number of principal components), and (iii) the size corrected input
matrix. A separate list of outputs is returned for each matrix included
in the input list.
all_DESeq2_results <- runDEAnalysis(
list_data = AA_results, metadata = metadata, heatmap = TRUE,
dim_reduct = "PCA", numPC = 2, batch = "tissue",
color_factor = "tissue", show_legend = "right", label_factor = "cell.type"
)
grid.draw(all_DESeq2_results$plots$AADemand$heatmap) # Visualize heatmap plot
all_DESeq2_results$plots$AADemand$exploratory$ElbowPlot # Visualize elbow plot
all_DESeq2_results$plots$AADemand$exploratory$PC1_vs_PC2 # Visualize PCA plot#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.2.1 biomaRt_2.69.0 tTEscanR_0.99.0 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.3
#> [3] farver_2.1.2 blob_1.3.0
#> [5] filelock_1.0.3 Biostrings_2.81.3
#> [7] S7_0.2.2 fastmap_1.2.0
#> [9] BiocFileCache_3.3.0 digest_0.6.39
#> [11] lifecycle_1.0.5 KEGGREST_1.53.4
#> [13] RSQLite_3.53.2 magrittr_2.0.5
#> [15] compiler_4.6.1 rlang_1.2.0
#> [17] sass_0.4.10 progress_1.2.3
#> [19] tools_4.6.1 yaml_2.3.12
#> [21] ggsignif_0.6.4 knitr_1.51
#> [23] labeling_0.4.3 prettyunits_1.2.0
#> [25] S4Arrays_1.13.0 bit_4.6.0
#> [27] curl_7.1.0 DelayedArray_0.39.3
#> [29] RColorBrewer_1.1-3 abind_1.4-8
#> [31] BiocParallel_1.47.0 purrr_1.2.2
#> [33] withr_3.0.3 BiocGenerics_0.59.8
#> [35] sys_3.4.3 grid_4.6.1
#> [37] stats4_4.6.1 ggpubr_0.6.3
#> [39] ggplot2_4.0.3 scales_1.4.0
#> [41] SummarizedExperiment_1.43.0 cli_3.6.6
#> [43] rmarkdown_2.31 crayon_1.5.3
#> [45] generics_0.1.4 otel_0.2.0
#> [47] httr_1.4.8 DBI_1.3.0
#> [49] cachem_1.1.0 stringr_1.6.0
#> [51] parallel_4.6.1 AnnotationDbi_1.75.0
#> [53] BiocManager_1.30.27 XVector_0.53.0
#> [55] matrixStats_1.5.0 vctrs_0.7.3
#> [57] Matrix_1.7-5 carData_3.0-6
#> [59] jsonlite_2.0.0 car_3.1-5
#> [61] IRanges_2.47.2 hms_1.1.4
#> [63] S4Vectors_0.51.5 rstatix_0.7.3
#> [65] ggrepel_0.9.8 bit64_4.8.2
#> [67] Formula_1.2-5 maketools_1.3.2
#> [69] locfit_1.5-9.12 tidyr_1.3.2
#> [71] jquerylib_0.1.4 glue_1.8.1
#> [73] codetools_0.2-20 stringi_1.8.7
#> [75] gtable_0.3.6 GenomicRanges_1.65.0
#> [77] tibble_3.3.1 pillar_1.11.1
#> [79] rappdirs_0.3.4 htmltools_0.5.9
#> [81] Seqinfo_1.3.0 R6_2.6.1
#> [83] dbplyr_2.6.0 httr2_1.2.3
#> [85] evaluate_1.0.5 lattice_0.22-9
#> [87] Biobase_2.73.1 backports_1.5.1
#> [89] png_0.1-9 broom_1.0.13
#> [91] memoise_2.0.1 bslib_0.11.0
#> [93] Rcpp_1.1.1-1.1 SparseArray_1.13.2
#> [95] DESeq2_1.53.0 xfun_0.59
#> [97] MatrixGenerics_1.25.0 buildtools_1.0.0
#> [99] pkgconfig_2.0.3