--- title: "

tTEscanR Visualization Module" output: BiocStyle::html_document: toc: true toc_float: true theme: default css: style.css vignette: > %\VignetteIndexEntry{4. Visualization Module} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r file_settings, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r notes_format, echo = FALSE, results = 'asis'} cat(" ") ```

# 1. Overview **tTEscanR** includes a dedicated **visualization module** that provides multiple functions for generating plots based on the output tables produced at each step of the analysis. The primary goal of this module is to facilitate a more intuitive and streamlined interpretation of results, allowing researchers to easily explore and understand their data. Additionally, it helps summarize complex findings in a visually accessible manner, enhancing the overall clarity and impact of the analysis. ```{r setup, message = FALSE, warning = FALSE} # install.packages("/avarassanchez/tTEscanR") library(tTEscanR) ``` ```{r other_libraries, message = FALSE, warning = FALSE} library(dplyr) ``` To illustrate the usage of each plotting function and demonstrate the flexibility provided by various parameters, we will first run **tTEscanR**. In this tutorial, we will analyze a single-cell fetal human atlas described in *[@Cao2020]* and *[@Domcke2020]*, and previously examined by *[@Gao2022]*. A subset of this dataset is included as a default dataset in **tTEscanR** and can be directly loaded to performm the analysis. A step-by-step explanation of this pipeline is available in the **tTEscanR User Guide** vignette. ```{r load_datasets, message = FALSE, warning = FALSE} data( default_tTEscanR_mRNA_data, default_tTEscanR_tRNA_data, default_tTEscanR_metadata ) ``` ```{r execute_workflow, message = FALSE, warning = FALSE} tTEobject <- runPipeline( mRNA_data = default_tTEscanR_mRNA_data, tRNA_data = default_tTEscanR_tRNA_data, metadata = default_tTEscanR_metadata, species = "hg38", batch = "tissue", runDESeq = FALSE, verbose = FALSE ) ```

# 2. Configuration options The visualization functions in **tTEscanR** are highly customizable and can be applied to any properly formatted dataset. Depending on the plot type, specific data requirements must be met, which may involve **prior data transformation** or restructuring. This guide provides an overview of the available **visualization options** in **tTEscanR**, and illustrates key **parameter settings** and their effects through practical examples. ::: {.note} The visualization functions in **tTEscanR** are not limited to outputs generated within the package's pipeline. Users can apply them to external dataset, provided that the required data formatting and structure are met. ::: | Function | Purpose | |------------------------|-------------------------------------------------| | `plotProportion()` | Shows features' frequencies differences within and between conditions | | `plotDistribution()` | Displays features' distributions across conditions | | `plotTargetComparison()` | Variation of `plotDistribution()` to compare a target feature against the mean across conditions | | `plotCorrelation()` | Features correlation | | `plotPermutation()` | Compares the baseline codon exonic background against the current codon usage | | `plotTEscore()` | Represents the tTE scores obtained from `Compute_tTE()` | ## 2.1. Data transformation There are **helper functions** in **tTEscanR** designed to properly transform data for downstream analysis and visualization. the data. One such function is **`transformFormat()`**, which converts a count matrix, with features as rows and conditions as columns, into a **long-format table**. In this format, each row-column combination from the original matrix becomes an individual row in the output. The function also provides an option to **normalize** the input data. Parameters such as `rownames_to_column`, `names_to` and `values_to` allow users to customize the names of the columns in the resulting long-format table. ::: {.note} The **normalization** performed by **transformFormat()** is done by converting the raw counts into relative abundances (i.e. dividing each column by its column-wise sum). ::: ```{r check_codon_usage, eval = FALSE} head(getAssay(tTEobject, "CodonUsage")) ``` ```{r codon_usage_data_transform} long_format_codon_usage <- transformFormat( data = getAssay(tTEobject, "CodonUsage"), normalize = TRUE, rownames_to_column = "codon", # features (row) of the CodonUsage matrix names_to = "condition", # conditions (col) of the CodonUsage matrix values_to = "usage" ) # values of the CodonUsage matrix # long_format_codon_usage contains 3 columns: codon, condition, usage # We are going to divide the condition column into tissue and cell_type long_format_codon_usage <- long_format_codon_usage %>% tidyr::separate(.data$condition, into = c("tissue", "cell_type"), sep = "-") ``` ```{r check_long_codon_usage, eval = FALSE} head(long_format_codon_usage) ``` ```{r aa_demand_data_transform} long_format_AA_demand <- transformFormat( data = getAssay(tTEobject, "AADemand"), normalize = TRUE, rownames_to_column = "AA", # features (row) of the AADemand matrix names_to = "condition", # conditions (col) of the AADemand matrix values_to = "demand" ) # values of the AADemand matrix # long_format_AA_demand contains 3 columns: AA, condition, demand # We are going to divide the condition column into tissue and cell_type long_format_AA_demand <- long_format_AA_demand %>% tidyr::separate(.data$condition, into = c("tissue", "cell_type"), sep = "-") ``` ## 2.2. Parameters To enhance usability, parameters' names and structures have been kept **consistent across functions** with most parameters being shared among them. This standardization simplifies customization and ensures an intuitive workflow. | Parameter | Description | |------------------------|-------------------------------------------------| | `data` | Properly formatted dataset | | `plot` | A character string indicating the type of plot to generate | | `ncols` | Numeric; Number of columns for arranging panels. Defaults to 1 | | `x_axis_col` | Name of the column in data to use for the x-axis | | `y_axis_col` | Name of the column in data to use for the y-axis | | `condition_col` | Name of the column in data to use for coloring/grouping by condition | | `targeted_arg` | Optional; A vector defining key feature clusters to highlight or label. | | `color_palette` | Optional; A vector of color codes to customize plot appearance | | `save_format` | Optional; A character string indicating the format for saving the plot. Supported formats: "png" or "pdf" | | `out_name` | Optional; Name for the saved plot (if `save_format` specified) | | `out_directory` | Optional; Path to the directory where the plot will be saved (if `save_format` specified) | | `show_legend` | A character string specifying the position of the legend. Supported formats: "none" (default), "top", "bottom", "right" and "left" | | `add_titles` | Logical; if TRUE, includes titles in the plot. Defaults to TRUE | ::: {.note}

Function-specific parameters will be introduced within each plot's corresponding description section.

Not all generic parameters are required by every function; usage depends on the specific plotting context.

The values provided for *x_axis_col*, *y_axis_col* and *condition_col* must exactly match the column names in the input data.

:::

# 3. Visualization options The **`plotDistribution()`** function generates **jitter plots**, **barplots** or **boxplots** to visualize data distributions (e.g. raw or normalize codon usage) across features (e.g. codons). This function provides an intuitive representation of how usage patterns vary across conditions. The type of plot can be specified via the `plot` argument, while layout and grouping can be controlled using the `panels` parameter. In the example below, we visualize **normalized codon usage** across conditions to explore how codon preferences fluctuate. The dataset is a **single-cell** **dataset**, where each cell is annotated by its type and tissue of origin. As a general guideline, the *jitter* plot mode is well-suited for large datasets, as it allows users to observe the distribution of values at a general level and identify patterns across groups. ```{r general_dist_plot, fig.width = 7, fig.height = 4, fig.align = 'center'} # Codon usage distribution plot (jitter) plotDistribution( data = long_format_codon_usage, plot = "jitter", x_axis_col = "codon", y_axis_col = "usage", condition_col = "tissue", show_legend = "right", add_titles = FALSE ) ``` Alternatively, to reduce dataset complexity and enable more in-depth analysis, we can focus on a subset of **target conditions**. In this example, we restrict the analysis to spleen-derived cells, specifically selecting those belonging to the lymphoid and myeloid lineages. For this approach we can use a *barplot* plot mode. ```{r generate_subset_data} # Data subset: spleen tissue and selected cell types (lymphoid and myeloid) spleen_indexes <- grep("Spleen", long_format_codon_usage$tissue) long_format_cu_spleen <- long_format_codon_usage[spleen_indexes, ] lymphoid_indexes <- grep("Lymphoid", long_format_cu_spleen$cell_type) myeloid_indexes <- grep("Myeloid", long_format_cu_spleen$cell_type) cells_indexes <- c(lymphoid_indexes, myeloid_indexes) long_format_codon_usage_subset <- long_format_cu_spleen[cells_indexes, ] ``` To facilitate visualization, we enable the `panels` parameter in **`plotDistribution`**, which arranges each condition in a separate facet. This separation allows for easier comparison across groups. Additionally, the `ncols` parameter controls the **number of columns** used in the facet layout, helping optimize the plot's readability, especially when working with multiple conditions. ```{r dist_barplot, fig.width = 7, fig.height = 4, fig.align = 'center'} # Codon usage distribution plot (barplot) plotDistribution( data = long_format_codon_usage_subset, plot = "barplot", ncols = 1, facet_col = "cell_type", x_axis_col = "codon", y_axis_col = "usage", condition_col = "cell_type", show_legend = "none", add_titles = FALSE ) ``` If, instead of comparing the distributions of two cell types within the same tissue, we want to explore how distributions vary across two different tissues, we can switch to the *boxplot* mode. This plot type summarizes the variation and central tendency of the data, making it easier to compare distribution between broader biological groups. ```{r generate_subset_data_2} # Data subset: specific codons in spleen and heart tissues spleen_indexes <- grep("Spleen", long_format_codon_usage$tissue) heart_indexes <- grep("Heart", long_format_codon_usage$tissue) tissue_indexes <- c(spleen_indexes, heart_indexes) long_format_cu_tissues <- long_format_codon_usage[tissue_indexes, ] selected_codons <- c( "CAA", "CAC", "CAG", "CAT", "CCA", "CCC", "CCG", "CCT", "CGA", "CGC", "CGG", "CGT", "CTA", "CTC", "CTG", "CTT", "GAA", "GAC", "GAG", "GAT", "GCA", "GCC", "GCG", "GCT", "GGA", "GGC", "GGG", "GGT", "GTA", "GTC", "GTG", "GTT" ) codons_indexes <- which(long_format_cu_tissues$codon %in% selected_codons) long_format_cu_tissues <- long_format_cu_tissues[codons_indexes, ] ``` In this example, we introduce a pre-defined `color_palette` to explicitly assign specific colors to each of the tissues being analyzed. ```{r distribution_plot, fig.width = 6, fig.height = 4, fig.align = 'center'} # Codon usage distribution plot (barplot) plotDistribution( data = long_format_cu_tissues, plot = "boxplot", x_axis_col = "codon", y_axis_col = "usage", add_stats = FALSE, condition_col = "tissue", color_palette = c(Heart = "#de77ae", Spleen = "#7fbc41"), show_legend = "bottom", add_titles = FALSE ) ```

The **`plotTargetComparison()`** function extends the functionality of **`plotDistribution()`** by allowing direct comparison between a target confition and the overall mean. This visualization helps identify how codon, anticodon, or amino acid usage in the selected condition deviates from the average profile. **Function-specific parameters:**

`mean` - A numeric vector containing the mean values of the codons/anticodon/amino acids present in `data`.

`show_difference` - Logical; if TRUE, displays the differences between the mean and the targeted values.

In the following example, we focus on cells from the cerebellum. To enable a proper comparison, we first subset the dataset to include only cerebellum-specific data and then compute the mean codon usage using **`computeMeanUsage()`**. The overall mean codon usage was previously obtained during the initial execution of **`runPipeline()`**. ```{r cerebellum_mean} # Data subset: mean codon usage across cerebellum cell types codon_usage <- getAssay(tTEobject, "CodonUsage") cerebellum_indexes <- grep("Cerebellum", colnames(codon_usage)) cerebrum_indexes <- grep("Cerebrum", colnames(codon_usage)) brain_indexes <- c(cerebellum_indexes, cerebrum_indexes) brain_codon_usage <- codon_usage[, brain_indexes] # Define the metadata metadata_brain <- data.frame( label = colnames(brain_codon_usage), stringsAsFactors = FALSE ) metadata_brain <- tidyr::separate( metadata_brain, label, into = c("tissue", "cell.type"), sep = "-" ) metadata_brain$conditions <- colnames(brain_codon_usage) mean_brain_codon_usage <- computeMeanUsage( data = brain_codon_usage, metadata = metadata_brain, batch = "tissue", verbose = FALSE ) ``` Based on the calculations above, the plot illustrates: (i) the mean codon usage of the cerebellum cells as **purple dots**, (ii) the overall mean codon usage across conditions as **grey bars**, and (iii) the difference between the two as **blue lines**. ```{r compare_to_mean_plot, fig.width = 7, fig.height = 4, fig.align = 'center'} additional_metrics <- getMetadata(tTEobject, "CodonUsage_AdditionalMetrics") plotTargetComparison( target_data = data.frame(mean_brain_codon_usage), overall_data = data.frame(additional_metrics$MeanCodonUsage), x_axis_col = "feature", y_axis_col = "mean_usage_across_conditions", show_difference = TRUE, add_titles = FALSE ) ```

The **`plotProportion()`** function provides a summary visualization of a specific metric's distribution across features or conditions. It supports multiple display modes, including *bar*, *radar* or *donut* plots, allowing users to choose the most effective format for their data. ::: {.note} The choice of **plot type** in this and other functions within **tTEscanR** is left to the user's discretion, depending on the nature of the data (e.g. number of features, distribution of counts) and the specific research question being addressed. ::: ```{r extract_mean} mean_codon_usage <- additional_metrics$MeanCodonUsage mean_codon_usage$codon <- mean_codon_usage$feature mean_codon_usage <- featuresToAA( data = mean_codon_usage, position = "feature", notation_from = "codon", notation_to = "aa", verbose = FALSE ) ``` ```{r prop_plot_mean, fig.width = 7, fig.height = 5, fig.align = 'center'} plotProportion( data = mean_codon_usage, plot = "bar", var_numerical = "mean_usage_across_conditions", var_categorical = "codon", var_color = "feature", show_legend = "none" ) # Here feature corresponds to AA ``` ```{r mean_AA_demand} # Data subset: mean codon usage across cerebellum cell types aa_demand <- getAssay(tTEobject, "AADemand") cerebellum_indexes_AA <- grep("Cerebellum", colnames(aa_demand)) cerebrum_indexes_AA <- grep("Cerebrum", colnames(aa_demand)) brain_indexes_AA <- c(cerebellum_indexes_AA, cerebrum_indexes_AA) brain_AAdemand <- aa_demand[, brain_indexes_AA] mean_brain_AAdemand <- computeMeanUsage( data = brain_AAdemand, metadata = metadata_brain, batch = "tissue", verbose = FALSE ) ``` ```{r prop_plot_brain, fig.width = 6, fig.height = 4, fig.align = 'center'} plotProportion( data = mean_brain_AAdemand, plot = "donut", var_numerical = "mean_usage_across_conditions", var_categorical = "feature", show_legend = "none" ) # Here feature corresponds to AA ```

The **`plotTEscore()`** function generates a **violin plot** to compare the **tTE scores** between a specified target condition and all other conditions in the dataset. This visualization helps assess how distinct the translation efficiency profile of the target group is relative to the rest. ::: {.note} As **plotTEscore()** uses violin plots, it is important to ensure that each group, especially the targeted condition, has a **sufficient number of data** **points**. Violin plots may not provide a reliable representation of the distribution if the sample size is too small. Alternatively, **plotDistribution** and **plotProportion()** could be used. ::: ```{r extract_metadata} conditions_metadata <- getMetadata(tTEobject, "ConditionsLabels") tTEresults_codon <- getMetadata(tTEobject, "tTEresults_codon") tTEresults_AA <- getMetadata(tTEobject, "tTEresults_AA") ``` ```{r tTE_scores_plot} plotTEscore( data = tTEresults_codon, metadata = conditions_metadata, index_col = "conditions", class_col = "tissue", add_stats = FALSE ) ``` ```{r target_immune} # Targeted condition: lymphoid and myeloid cells conditions_metadata$group <- "other" conditions_metadata$group[ grep("Myeloid", conditions_metadata$conditions) ] <- "myeloid" conditions_metadata$group[ grep("Lymphoid", conditions_metadata$conditions) ] <- "lymphoid" ``` ```{r score_plot_immune, fig.width = 6, fig.height = 4, fig.align = 'center'} # input data has 4 columns: condition, tTE, p_value, neg_log10_tTE_p_value # Codon-anticodon tTE score plotTEscore( data = tTEresults_codon, metadata = conditions_metadata, index_col = "conditions", class_col = "group", color_palette = c( myeloid = "#abd9e9", lymphoid = "#f1b6da", other = "#fee0b6" ), add_stats = TRUE ) # AA demand-supply tTE score plotTEscore( data = tTEresults_AA, metadata = conditions_metadata, index_col = "conditions", class_col = "group", color_palette = c( myeloid = "#abd9e9", lymphoid = "#f1b6da", other = "#fee0b6" ), add_stats = FALSE ) ``` In cases where we want to refine the selection of target cell types, such as including or excluding specific subtypes, it is essential to encode this distinction explicitly in the input metadata (`targets`) data frame. For instance, if we aim to focus on **neurons** but exclude cells labeled as **ENS neurons**, we need to assign those excluded cells to a different category, referred to as *"other"* in the metadata. This ensures the function correctly recognizes which cells belong to the **target group** and which do not. ```{r target_neurons} # Targeted condition: neurons conditions_metadata$group <- "other" conditions_metadata$group[grep( "neuron", conditions_metadata$conditions )] <- "neurons" conditions_metadata$group[grep( "ENS neuron", conditions_metadata$conditions )] <- "other" ``` ```{r score_plot_neurons, fig.width = 6, fig.height = 4, fig.align = 'center'} # AA demand-supply tTE score plotTEscore( data = tTEresults_AA, metadata = conditions_metadata, index_col = "conditions", class_col = "group", color_palette = c(neurons = "#8073ac", other = "#fee0b6"), add_stats = TRUE ) ``` # 4. References ```{r session-info, echo=FALSE} sessionInfo() ```