---
title: "
tTEscanR User Guide"
output:
BiocStyle::html_document:
toc: true
toc_float: true
theme: default
css: style.css
vignette: >
%\VignetteIndexEntry{1. Introduction to tTEscanR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
```{r file_settings, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
```{r notes_format, echo = FALSE, results = 'asis'}
cat("
")
```
# Overview
**tTEscanR** is a powerful, versatile and user-friendly R package designed to
quantify and analyze the relationship between codon usage in mRNA and the
availability of corresponding anticodons in tRNA. The package computes a
**theoretical translation efficiency (tTE)** score as a proxy of translation
elongation efficiency, hereafter referred to as translation efficiency.
In this document, we present a case example to demonstrate the potential of
**tTEscanR**.
```{r setup, message = FALSE, warning = FALSE}
# install.packages("/avarassanchez/tTEscanR")
library(tTEscanR)
```
```{r other_libraries, message = FALSE, warning = FALSE}
library(dplyr)
```
# Workflow
**tTEscanR** features a **modular structure** that enables running specific
components independently or as part of a comprehensive pipeline. This design
provides flexibility to enhance and complement the analysis of codon-anticodon
dynamics across diverse biological contexts.
## 1. Loading the data
**tTEscanR** supports both gene expression and chromatin accessibility
profiling data. The accepted mRNA and tRNA inputs consist of pre-processed gene
expression count matrices, where **features** (e.g. genes or transcripts) are
organized as rows and **conditions** (e.g. samples, replicates, or individual
cells) as columns. The package is optimized for **bulk** and **single-cell**
datasets. The datasets should be loaded according to their respective data
files formats.
In this tutorial, we will analyze a single-cell fetal human atlas described
in *[@Cao2020]* and *[@Domcke2020]*, and previously examined by *[@Gao2022]*.
A this data and a subset of it are included in **tTEscanR** and can be directly
loaded.
```{r load_data_mRNA, message = FALSE, warning = FALSE}
data(default_tTEscanR_mRNA_data)
```
**Dimensions:** 9900 genes (rows) x 172 cell types (columns)
**Rows:** The genes are expressed in the gene name format (e.g. GATSL1)
**Columns:** The cell type labels are composed of two parts: tissue - cell type
(e.g. Adrenal-Adrenocortical cells)
```{r load_data_tRNA, message = FALSE, warning = FALSE}
data(default_tTEscanR_tRNA_data)
```
**Dimensions:** 377 tRNA genes (rows) x 89 cell types (columns)
**Rows:** The tRNA genes labels are: tRNA - Amino acid - Anticodon - Identifier
number (e.g. tRNA-Asn-GTT-5-1)
**Columns:** The cell types labels have the same format as described for the
mRNA data
## 2. Setup the tTEscanR object
### 2.1 Pre-processing
::: {.note}
The **pre-processing module** formats and standardizes input matrices to ensure
they are structured correctly for reliable analysis through the pipeline.
:::
The **`tRNACutsFilter()`** function filters out **samples or conditions** with
low total tRNA expression, helping to ensure overall **data quality**.
```{r filter_tRNAs, message = FALSE, warning = FALSE}
filtered_tRNA_data <- tRNAFilterCuts(
data = default_tTEscanR_tRNA_data, cutoff = 5000
)
```
### 2.2. Defining the tTEscanR object
::: {.note}
The **tTEscanR object** is a centralized data structure that stores input
matrices, metadata, and results, continuously updated to ensure consistency
across the pipeline.
In order to ensure robustness throughout the pipeline **specific ids** have
been assigned and should be respected by the user (see the documentation for
more details).
:::
The **`createObject()`** function initializes a new **tTEscanR**
object to store and organize analysis data. The input can be either a
**single matrix** or a **list of matrices** (to support multiple datasets), and
may optionally include **metadata**. For proper functionality, all input
matrices must be appropriately named. To modify, extend, or update an existing
**tTEscanR** object with new data or metadata, use
**`updateObject()`**.
```{r metadata_definition, message = FALSE, warning = FALSE}
data(default_tTEscanR_metadata)
```
```{r createObject, message = FALSE, warning = FALSE}
# Adding the mRNA and tRNA datasest to the object
tTEobject <- createObject(
counts = list(default_tTEscanR_mRNA_data, filtered_tRNA_data),
assay = list("mRNA", "tRNA"),
meta.data = default_tTEscanR_metadata, meta.data.ids = "ConditionsLabels"
)
```
```{r updateObject, message = FALSE, warning = FALSE}
# Updating the object created before some metadata for reference
matching_celltypes <- intersect(
colnames(default_tTEscanR_mRNA_data), colnames(filtered_tRNA_data)
)
tTEobject <- updateObject(
object = tTEobject, meta.data = matching_celltypes,
meta.data.ids = "matching_celltypes", overwrite = TRUE
)
```
Each component of a **tTEscanR** object can be accessed using the `getAssays()`
or `getMetadata()` functions that requires the object and the name of the slot
that wants to be retrieved.
## 3. Standard workflow
The analysis can be carried out across **three hierarchical layers of**
**information**: gene expression, codon and anticodon pool, and amino acid
level. This multi-layered approach provides a comprehensive view of translation
efficiency.
### 3.1. Codon usage assessment
Codon usage is computed by performing a **matrix multiplication** between the
mRNA expression data and a **codon frequency-per-gene reference matrix**. This
reference matrix can be generated using **`obtainCodonComposition()`** or
alternatively, a **user-defined** codon frequency matrix can be supplied
directly, providing flexibility for custom analyses.
::: {.note}
The reference **codon frequency-per-gene** matrix represents the codon
distribution of each protein-coding gene in a reference genome.
For more details, please refer to the dedicated **codon frequency vignette**.
:::
The **`computeCodonUsage()`** function calculates **codon usage** by
multiplying an mRNA expression matrix with a codon frequency-per-gene table.
The resulting matrix contains codons as rows and samples or conditions as
columns.
The codon frequency table can either be: (i) provided directly (e.g. computed
previously using **`obtainCodonComposition()`**), or (ii) loaded from the
built-in defaults available for human and mouse.
In addition to generating the codon usage matrix, **`computeCodonUsage()`** can
optionally compute the following:
- **Codon exonic background**: genome-wide codon composition calculated across
all genes.
- **Mean codon usage**: average codon usage across all conditions or samples.
- **Exonic background and mean usage correlation**: metric used to assess bias
in codon usage relative to the underlying genomic codon composition.
```{r codon_usage, message = FALSE, warning = FALSE}
# We first need to add the correction factor to the tTEscanR object
# It has to be stored as CorrectionFactor
tTEobject <- updateObject(
object = tTEobject, meta.data = "tissue", meta.data.ids = "CorrectionFactor"
)
tTEobject <- computeCodonUsage(
object = tTEobject, codon_freq = NULL, species = "hg38",
additional_metrics = TRUE, overwrite = TRUE
)
```
```{r correlation_plot_mean, message = FALSE, warning = FALSE}
# Transforming the data
additional_metrics <- getMetadata(tTEobject, "CodonUsage_AdditionalMetrics")
mean_codon_usage <- additional_metrics$MeanCodonUsage
exonic_background <- additional_metrics$CodonExonicBackground
exonic_background <- as.data.frame(exonic_background)
correlation_mean_background <- cbind(mean_codon_usage, exonic_background)
plotCorrelation(
data = correlation_mean_background, plot = "MeanCodonUsage",
x_axis_col = "mean_usage_across_conditions",
y_axis_col = "exonic_background",
extra_val = additional_metrics$MeanCodonCorr,
condition_col = "feature", # Here feature = codons
add_titles = TRUE, show_legend = "none"
)
```
You can further evaluate the codon usage output using
**`showPoolContribution()`**, which quantifies the contribution of the most
highly expressed genes to the overall codon pool across different conditions.
This analysis helps identify whether codon usage is dominated by a small subset
of highly expressed transcripts or is broadly distributed across the
transcriptome.
```{r codon_pool_contribution, message = FALSE, warning = FALSE}
tTEobject <- showPoolContribution(
object = tTEobject, N = 10, species = "hg38", overwrite = TRUE
)
```
```{r correlation_plot_diversity, message = FALSE, warning = FALSE}
# Transforming the data
codon_pool_contr <- getMetadata(tTEobject, "CodonPoolContribution_Results")
codon_pool_diversity <- codon_pool_contr$top10GenesCodonPoolDiversity
colnames(codon_pool_diversity) <- c(
"condition", "original_top_contribution", "baseline_correlation"
)
codon_pool_diversity <- codon_pool_diversity %>%
tidyr::separate(
.data$condition,
into = c("tissue", "cell_type"), sep = "-"
)
plotCorrelation(
data = codon_pool_diversity, plot = "PoolDiversity",
x_axis_col = "original_top_contribution",
y_axis_col = "baseline_correlation",
condition_col = "tissue", label_col = "cell_type", show_legend = "right"
)
```
::: {.note}
The outputs generated during the execution of **tTEscanR** can be transformed
into **comprehensive visualizations** to support data interpretation and
exploration. A variety of plotting functions are available in **tTEscanR** to
represent codon usage patterns, gene contribution, and other key metrics.
For more details, please refer to the dedicated **visualization vignette**.
:::
### 3.2. Anticodon usage assessment
The **`computeAnticodonUsage()`** function calculates **anticodon usage** by
aggregating tRNA expression data at the anticodon level. Analogous to
**`computeCodonUsage()`**, the resulting matrix contains anticodons as rows and
samples or conditions as columns.
```{r anticodon_usage, message = FALSE, warning = FALSE}
tTEobject <- computeAnticodonUsage(object = tTEobject)
```
### 3.3. Amio acid level assessment
The **`computeAAUsage()`** function computes **amino acid demand** and
**supply** by integrating codon and anticodon usage data, respectively. Users
can choose to calculate demand and supply either separately or together.
```{r ammino_acid_demand, message = FALSE, warning = FALSE, eval = FALSE}
# Computing AA demand
tTEobject <- computeAAUsage(object = tTEobject, level = "demand")
```
```{r ammino_acid_supply, message = FALSE, warning = FALSE, eval = FALSE}
# Computing AA supply
tTEobject <- computeAAUsage(object = tTEobject, level = "supply")
```
```{r ammino_acid, message = FALSE, warning = FALSE}
# Computing simultaneously AA demand and supply
tTEobject <- computeAAUsage(
object = tTEobject, level = "both",
overwrite = TRUE
)
```
### 3.4. Theoretical Translation Efficiency (tTE) computation
The **`computeTheoreticalTE()`** function calculates the **Theoretical**
**Translation Efficiency (tTE)** by measuring the correlation between: (i)
codon usage and anticodon availability, or (ii) amino acid demand and amino
acid supply. Users can compute these correlations separately or in combination.
To ensure accurate correlation between these data sources, it is crucial that
the mRNA and tRNA datasets share matching conditions (i.e. identical column
names representing the same samples or groups).
```{r tTE_score_codon, message = FALSE, warning = FALSE,eval = FALSE}
# Computing tTE at the codon-anticodon level
tTEobject <- computeTheoreticalTE(object = tTEobject, level = "codon")
```
```{r tTE_score_aa, message = FALSE, warning = FALSE, eval = FALSE}
# Computing tTE at the AA demand-supply level
tTEobject <- computeTheoreticalTE(object = tTEobject, level = "aa")
```
```{r tTE_score, message = FALSE, warning = FALSE}
# Computing simultaneously tTE at codon-anticodon and AA demand-supply levels
tTEobject <- computeTheoreticalTE(
object = tTEobject, level = "both", overwrite = TRUE
)
```
```{r extract_metadata}
conditions_metadata <- getMetadata(tTEobject, "ConditionsLabels")
```
```{r tTE_score_plots, fig.width = 6, fig.height = 4, fig.align = 'center'}
tTEresults_codon <- getMetadata(tTEobject, "tTEresults_codon")
tTEresults_AA <- getMetadata(tTEobject, "tTEresults_AA")
plotTEscore(
data = tTEresults_codon, metadata = conditions_metadata,
index_col = "conditions", class_col = "tissue", add_stats = FALSE
)
plotTEscore(
data = tTEresults_AA, metadata = conditions_metadata,
index_col = "conditions", class_col = "tissue", add_stats = FALSE
)
```
For visualization purposes, a set of **target conditions** (e.g. a specific
group of cells) can be defined, allowing comparison of their **tTE scores**
against those of all other conditions in the dataset. In this example, we focus
on neurons as the target group but exclude the ENS neurons from the selection
to refine the analysis.
```{r targeted_metadata_neurons}
conditions_metadata$group <- "other"
conditions_metadata$group[grep(
"neuron", conditions_metadata$conditions
)] <- "neurons"
conditions_metadata$group[grep(
"ENS neuron", conditions_metadata$conditions
)] <- "other"
```
```{r tTE_plot_neurons, fig.width = 6, fig.height = 4, fig.align = 'center'}
# Use tTEresults_codon to assess the codon-anticodon level
plotTEscore(
data = tTEresults_AA, metadata = conditions_metadata,
index_col = "conditions", class_col = "group", add_stats = TRUE
)
```
## 4. Differential expression analysis
The **`runDEAnalysis()`** function performs differential expression analysis
with DESeq2 and generates multiple plots to display the results. When datasets
share the same `conditions` and `name_sep` settings, they can be processed
together in a single run. The input to this function must be a list of matrices.
```{r assays_list}
# Other outputs that could be analyzed:
# mRNA <- getAssay(tTEobject, "mRNA")
# CodonUsage <- getAssay(tTEobject, "CodonUsage")
# tRNA <- getAssay(tTEobject, "tRNA")
# AnticodonUsage <- getAssay(tTEobject, "AnticodonUsage")
AA_results <- list(
AADemand = getAssay(tTEobject, "AADemand"),
AASupply = getAssay(tTEobject, "AASupply")
)
```
The outputs of the **`runDEAnalysis()`** function vary depending on the
parameters enabled. In this example, the results include: (i) a heatmap, (ii)
PCA plots (based on the selected number of principal components), and (iii) the
size corrected input matrix. A separate list of outputs is returned for each
matrix included in the input list.
```{r run_dea, message = FALSE, warning = FALSE, eval = FALSE}
all_DESeq2_results <- runDEAnalysis(
list_data = AA_results, metadata = metadata, heatmap = TRUE,
dim_reduct = "PCA", numPC = 2, batch = "tissue",
color_factor = "tissue", show_legend = "right", label_factor = "cell.type"
)
grid.draw(all_DESeq2_results$plots$AADemand$heatmap) # Visualize heatmap plot
all_DESeq2_results$plots$AADemand$exploratory$ElbowPlot # Visualize elbow plot
all_DESeq2_results$plots$AADemand$exploratory$PC1_vs_PC2 # Visualize PCA plot
```
## 5. References
```{r session-info, echo=FALSE}
sessionInfo()
```