--- title: "TiDEomics Tutorial" author: - name: Tianen He affiliation: - University of Oxford email: tianen.he@ndm.ox.ac.uk output: BiocStyle::html_document: self_contained: true toc: true toc_float: collapsed: true smooth_scroll: false toc_depth: 3 number_sections: true code_folding: show highlight: tango df_print: paged date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('TiDEomics')`" vignette: > %\VignetteIndexEntry{TiDEomics Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib nocite: '@*' --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL, # Related to https://stat.ethz.ch/pipermail/bioc-devel/ # 2020-April/016656.html dpi = 96, fig.width = 6, fig.height = 4, fig.keep = "all", max.height = "300px" ) local({ hook_output <- knitr::knit_hooks$get("output") knitr::knit_hooks$set(output = function(x, options) { if (!is.null(options$max.height)) { options$attr.output <- c( options$attr.output, sprintf('style="max-height: %s;"', options$max.height) ) } hook_output(x, options) }) }) ``` ## Introduction [**TiDEomics**](https://github.com/hte123/TiDEomics) is an R package designed to streamline analysis of time-course omics data (e.g. proteomics, transcriptomics), compatible with both single group analysis and multiple group comparison, and accommodating data with missing values. This tutorial demonstrates a basic workflow, explains key parameters, and gives practical tips. Key features covered: - *Preparing input*: Create a `SummarizedExperiment` object with the data matrix and sample annotation, with checks for correct formatting. - *Quality control*: Check value distributions and missingness patterns. - *Normalisation, grouping and merging replicates*: Normalise each feature to the starting time point when focused on changes from baseline. Transform the data for analyses that require single groups or single time series. - *Sample relationships visualisation (correlation matrix, PCA, UMAP)*: Explore global structure and major sources of variance, check for batch effects, and identify features driving PCs. - *Pairwise differential expression (by time or group)*: Identify DE features between pairs of time points within each group, and between pairs of groups at each time point. - *Classification by feature property (randomness and overall fold change)*: Classify features into different categories (e.g., non-random vs. random, differentially expressed vs. stable) based on properties including trend significance and maximum fold change. - *Segmented regression with Trendy*: Estimate breakpoints for each feature in each group and summarise dynamic patterns. - *Variance decomposition modified from PALMO*: Quantify relative contributions from `Time`, `Group`, and `Residual`, distinguish time or group-dependent DE features and "noisy" features. - *Module identification with WGCNA*: Identify co-expression modules with different temporal and group-specific patterns. - *Functional enrichment (gene ontology, drugs, etc.)*: Interpret biological meaning of identified DE features and modules. ## Installation ```{r install, eval = FALSE} if (!require("remotes", quietly = TRUE)) install.packages("remotes") remotes::install_github("hte123/TiDEomics") ``` ```{r load-packages} library(TiDEomics) library(magrittr) library(SummarizedExperiment) library(org.Mm.eg.db) ``` ## Example input TiDEomics expects: - `data`: log2-transformed data.frame, rows = features (e.g. genes, proteins), columns = samples. First column should be feature names. Missing values (NA) are allowed. - `sample_ann`: data.frame with columns: - `Sample`: sample names, match column names of `data` - `Group`: experimental group - `Time`: numeric time point - `Replicate`: replicate ID (optional, auto-generated if absent) - `Batch`: batch ID (optional) - `Subject`: biological subject ID for repeated-measures designs (optional, set via `subject_col`) Example: subset of [GSE263759](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE263759) data set published in @Traxler2025. - Ensembl IDs were mapped to symbols, genes with all zero counts were excluded. - Use time 0 untreated samples for other groups' time 0. - Sample 500 random genes. Code for preparing the example data is available in `data-raw/tutorial_input.R` and the resulting data is stored in `data/tutorial_sample_info.rda` and `data/tutorial_data.rda`. ```{r load-example} data("tutorial_sample_info", package = "TiDEomics") data("tutorial_data", package = "TiDEomics") data_obj <- create_input( data = tutorial_data, sample_ann = tutorial_sample_info) # Log-transform the data when not already done assays(data_obj)[[1]] <- log2(assays(data_obj)[[1]] + 1) # + 1 to avoid log(0) data_obj ``` ```{r view-data} assays(data_obj)[[1]] %>% head() # first few rows of the data matrix ``` ```{r coldata} colData(data_obj) # sample annotation ``` ## Workflow We present the workflow as below sections, explaining the usage and parameters. ### Optional: custom color palette By default, TiDEomics generates a default color palette (`scales::pal_hue()`) based on the number of groups. A custom palette for each group can be set with `set_custom_palette()`, which will be used in all subsequent plotting functions where applicable. ```{r palette} custom_palette <- c( "untreated" = "#1b9e77", "IFNbeta" = "#d95f02", "IFNgamma" = "#7570b3", "LPS" = "#e7298a" ) set_custom_palette(custom_palette) ``` ### Quality control ```{r abundance} plot_distribution(data_obj, facet_by = "Group") ``` If distributions differ drastically, consider global normalisation (e.g., quantile, median) *before* using TiDEomics. For data with missing values (e.g., proteomics), `plot_ID()` and `plot_missing()` can be used to check missingness patterns. ### Normalisation to Time 0 `normalise_to_start()` subtracts the baseline value from each feature (at time 0 or the first available time point if the feature is missing at time 0). Two modes are available for defining the baseline: - `by_subject = FALSE` (default): group-level baseline, mean of all samples at time 0 in the group. This is used when no subject-level information is available or when between-subject baseline differences are of interest. - `by_subject = TRUE`: subject-level baseline, mean of all samples at time 0 for each subject. This is used when subject-level information is available and the focus is on subject-specific changes from baseline. ```{r norm0} data_obj <- normalise_to_start(data_obj) ``` Both original and time-0 normalised data are stored in the `SummarizedExperiment` object and downstream functions allow to specify which to use for each analysis (`assay = 1` for original, `assay = 2` for time-0 normalised). **When not to normalise**: Use original data when absolute abundance differences across groups are of interest (e.g., constitutively different baseline levels). ### Splitting groups and merging replicates Use `split_groups()` to obtain group-specific `SummarizedExperiment` objects and `merge_replicates()` to average replicates for analyses / visualisation that require single time series for features, including `calc_feature_property()`, `run_Trendy()`, `plot_modules_v()` and `plot_modules_h()`. ```{r split} data_obj_list <- split_groups(data_obj) ``` ```{r merge-rep} data_obj_merged_list <- merge_replicates(data_obj_list) ``` ```{r merge-groups} data_obj_merged <- merge_groups(data_obj_merged_list) ``` ### Sample relationships (correlation, PCA, UMAP) Correlation matrix can be plotted with `plot_cor_matrix()` to check sample relationships and potential batch effects. ```{r cormat} plot_cor_matrix(data_obj, method = "spearman", label_rep = TRUE, label_batch = TRUE, cellwidth = 2, cellheight = 2 ) ``` High within-replicate correlation and clustering by `Group` or `Time` increase confidence in data quality and downstream analysis. If batch effects are present, consider batch correction before using TiDEomics. There are multiple functions for PCA and UMAP: - Use `plot_pca()` to visualise sample relationships in 2D. Output loadings can be used to identify features driving the principal components. - Use `plot_pca_3D()` to visualise sample relationships in 3D. - Use `PCAtools::eigencorplot()` to correlate PCs with Group and Time. - Use `plot_umap()` for UMAP and set `seed` for reproducibility. Features with missing values (NA) are automatically excluded from PCA and UMAP. Consider imputation (e.g., group-wise minimum imputation with `split_groups()`, `impute_groups()` and `merge_groups()`) to include features with missing values in PCA and UMAP. ```{r pca} PC <- plot_pca(data_obj, plot = TRUE, # pc1 = 1, pc2 = 2, # default to plot PC1 and PC2 plot_screeplot = TRUE, plot_loadings = FALSE, plot_morepc = TRUE, circle = FALSE, morepc = 1:5 ) ``` ```{r pca-3D} plot_pca_3D(PC, pcs = 1:3) ``` Note: the 3D plot may not display properly in some html, but should work in an interactive R session. ```{r pca-eigencor} PCAtools::eigencorplot(PC, metavars = c("Group", "Time"), components = paste0("PC", 1:5), col = colorRampPalette(c("#3C5488FF", "white", "#E64B35FF"))(100), colCorval = "black" ) ``` ```{r umap} umap_layout <- plot_umap(data_obj, seed = 1234) ``` PCA and UMAP can also be performed separately on each group to explore within-group structure and dynamics. By default, when plotting PCA by group, circles and arrows are added to show the trajectory of samples along time course. Set `circle = FALSE` or `arrow = FALSE` to remove circles and arrows. ```{r pca-umap-group, fig.width = 8, fig.height = 2.5} plot_pca_by_group(data_obj, circle = TRUE, arrow = TRUE, legend_pos = "top") # also accepts a list: plot_pca_by_group(data_obj_list) plot_umap_by_group(data_obj, seed = 1234, legend_pos = "top") # also accepts a list: plot_umap_by_group(data_obj_list) ``` ### Pairwise differential expression The two DE functions for pairwise comparison between time points and groups are built based on the limma package [@limma]. `DE_between_time()`: - Compare pairs of time points within each group. - Output: Nested lists of DE statistics tables, including all features or only significant features. - Use `plot_DE_between_time()` to summarise number of DE features per pair of time points per group. `DE_between_group()`: - Compare pairs of groups at each time point. - Output: Nested lists of DE statistics tables, including all features or only significant features, and line plots summarising number of DE features (can be removed by setting `plot = FALSE`). Common parameters for both functions: - `assay`: index of assay to use (1 = original, 2 = time0-normalised if available) - `filter`: minimal number of non-NA replicates for the feature to be included in DE testing (e.g., `filter = 1` means requiring features to have at least 1 non-NA value at both time points in the specific group, or in both groups at the specific time point). - `trend`: passed to `limma::eBayes(trend=)`. Set to `TRUE` for RNA-seq count-derived data to model the mean-variance trend. Leave as `FALSE` (default) for proteomics, metabolomics, or other log-intensity data where the mean-variance relationship is typically flat. - Default thresholds for DE are `p.adj < 0.05` and `|log2FC| > 1`. Please note that the example data only has two replicates per time point. More replicates are recommended for robust pairwise DE analysis. DE_between_time(): ```{r de-between-time, fig.width = 8, fig.height = 2, warning = FALSE} DE_between_time_out <- DE_between_time(data_obj, assay = 1, filter = 1, trend = TRUE) ``` ```{r de-between-time-all} DE_between_time_out$all_list$IFNbeta$`t2-t0` %>% head() ``` ```{r de-between-time-de} DE_between_time_out$de_list$IFNbeta$`t2-t0` %>% head() ``` ```{r de-between-time-plot, fig.width = 7, fig.height = 2} plot_DE_between_time(data_obj, de_list = DE_between_time_out$de_list, fontsize = 8, value = FALSE, nrow = 1, heatmap_width = 3 ) ``` DE_between_group(): ```{r de-between-group, fig.width = 3, fig.height = 2, out.width = "70%"} DE_between_group_out <- DE_between_group(data_obj, assay = 2, filter = 1, trend = TRUE) ``` ```{r de-between-group-all} DE_between_group_out$all_list$`IFNgamma-untreated`$`24` %>% head() ``` ```{r de-between-group-de} DE_between_group_out$de_list$`IFNgamma-untreated` %>% head() ``` Output of `DE_between_group()`and `DE_between_time()` can be plotted with `plot_volcano()` to visualise DE features of selected group(s) and time point(s). ```{r volcano} plot_volcano(DE_between_group_out, group1 = "untreated", group2 = "IFNgamma", time = 24, logFC_thres = 0.5, adjP_thres = 0.05, label = TRUE) ``` #### Statistical notes Both DE functions use `limma::lmFit()` followed by `limma::eBayes()` with empirical Bayes variance moderation, which stabilises variance estimates when the number of replicates is small. The input data should be log-transformed and normalised before analysis. For **RNA-seq** data with log-transformed counts (e.g., `log2(count + 1)` as used in this tutorial), set `trend = TRUE` to let limma model the mean-variance relationship - low-abundance features typically have higher variance. For **proteomics** (log~2~ intensity) and many other data, the mean-variance trend is typically weak or absent, so `trend = FALSE` (default) is commonly appropriate. However, this should be verified (e.g. with `limma::plotSA()`). When a Subject column is provided via `subject_col` in `create_input()`, `DE_between_time()` automatically uses a paired design via `limma::duplicateCorrelation(block = Subject)`. For independent-sample designs (no Subject), all replicates are treated as independent. ### Feature properties and classification `calc_feature_property()` computes per-feature properties: - `P_trend`: calculated with `randtests::bartels.rank.test()`, testing for non-randomness of the time profile (i.e., whether the feature shows a significant overall trend across time points). - `Max_FC`: maximum fold change across time points, calculated as the difference between the feature's maximum and minimum abundance across time points. This captures the overall magnitude of change across the time course, regardless of the specific time points at which changes occur. - `Max_FC_time`: the difference between time points with maximum and minimum abundance, providing information about the direction (up or down) and duration of changing. - `Exp_threshold`: user-set value threshold for a feature to be considered expressed in a sample. This can be used to filter features based on expression level, e.g. setting `threshold = 0` means that only values > 0 (log2(raw count + 1) > 0, i.e. raw count > 0 in the tutorial dataset) are considered expressed. The default is NULL, which means all non-NA values are considered expressed. - `T_total`, `T_exp` and `Exp_ratio`: number of total time points, number of expressed time points with values > `Exp_threshold` (or non-NA values if `threshold = NULL`), and proportion of expressed time points, which can be used to filter features based on completeness or expression level. The property values can be used to classify features into different categories (e.g., non-random vs. random, differentially expressed vs. stable) for downstream analysis and interpretation. For example, features with `P_trend < 0.05` and `Max_FC >= 1` can be candidates for non-randomly overall-changing DE features. Note: - When replicates are present, the input should be the mean of replicates at each time point, output of `merge_replicates()`. - The function is set to require at least 3 values > threshold to calculate `P_trend`. ```{r property} data_obj_merged_list <- calc_feature_property(data_obj_merged_list, threshold = 0) property_tb <- summarise_feature_property(data_obj_merged_list) head(property_tb) ``` Features expressed in only a subset of groups can be extracted with `group_specific_features()` based on the output table of `summarise_feature_property()`. ```{r group-specific} group_specific_features(property_tb, groups = c("untreated"), genename = FALSE, GO = FALSE ) ``` ### Segmented regression analysis `run_Trendy` fits segmented linear models to the time course for each feature to estimate breakpoints with Trendy package [@Trendy], see [Trendy](https://bioconductor.org/packages/release/bioc/html/Trendy.html) for details. **Notes**: - The function requires complete input (no NA). When data contains missing values (e.g. proteomics), use `impute_groups()` to impute (default: group minimum; pass `fun = function(x) min(x) / 2` for half-minimum, `fun = median` for median imputation, etc.), or restrict to features with complete data. - If `feature` is not specified, the function will run on all features filtered by expression / missing rate, which requires running `calc_feature_property()` before `impute_groups()`. - Number of time points needed = [# segments] x [minimum number of samples in a segment]. For example, when # segments = 2 (maxK = 1) and minNumInSeg = 2, at least 4 time points are needed. ```{r impute} data_obj_merged_imp_list <- impute_groups(data_obj_merged_list) ``` A subset of features is used for demonstration as `run_Trendy()` can be time-consuming. ```{r trendy-run} set.seed(1234) random_features <- sample(rownames(data_obj_merged_imp_list[[1]]), 50) example_res_list <- run_Trendy(data_obj_merged_imp_list, feature = random_features, minExp = 0.5, maxK = 1, minNumInSeg = 2, meanCut = 0, NCores = 2 ) ``` Use `plot_segments()`to plot the fitted segments and breakpoints for selected features. ```{r trendy-plot, fig.height = 2.5, fig.width = 8} plot_segments(data_obj_merged_imp_list, example_res_list, feature = c("Zfp639", "Tk1"), # example features ylab = "Log2 (raw count + 1)" ) ``` Plot the distribution of breakpoints across time points with `plot_breakpoints()` in each group, and summarise the temporal patterns (combination of trends of each segment) of features with `summarise_Trendy()` and `extract_segment_trends()`. ```{r trendy-summary, fig.height = 2, fig.width = 4} plot_breakpoints(example_res_list) trendy_summary <- summarise_Trendy(example_res_list) trendy_summary %>% head() ``` ```{r trendy-list} trendy_list <- extract_segment_trends(trendy_summary) trendy_list$IFNbeta ``` ### Variance decomposition Variance of each feature is decomposed by linear mixed models (LMM) into contributions from `Group`, `Subject` (when present), `Time`, `Residual`, and optionally `Group:Time` (or `Subject:Time`) interaction, which captures group-specific temporal patterns. This helps to identify time-dependent features, group-dependent features, and "noisy" features with high residual variance. **Interpreting the components:** - High `Time`: time-dependent features, dynamic along time course consistently across groups. Good candidates for `run_Trendy()` or other time-focused analysis. - High `Group`: group-dependent features, stable along time but differentially expressed between groups. May be baseline biological markers. - High `Subject` (when present): features with baseline differences between individuals, biologically meaningful in patient studies. - High `Residual`: unexplained variation, consider excluding for downstream analysis. The function `decomp_variance()` is inspired by `PALMO::lmeVariance()`[@PALMO]. Group and Time are always included (auto-skipped if only 1 group or time point present). `Subject` is included when present in colData. Set `interaction = TRUE` to add `Group:Time` (or `Subject:Time` with Subject) as a variance component capturing group-specific temporal patterns. Sufficient replicates per combination are required for stable estimates. ```{r variance-decomposition, fig.height = 3} # filter genes for variance decomposition: # at least 50% values > 0 (raw count > 0) in at least 2 groups decomp_filter_genes <- group_specific_features(property_tb, filter_ratio = 0.5, group_pct = 2 / 4, GO = FALSE, genename = FALSE ) var_decomp <- decomp_variance(data_obj, features = decomp_filter_genes, assay = 1, core = 2 ) plot_variance(var_decomp, rank = "Time", top_n = 20) plot_variance(var_decomp, rank = "Group", top_n = 20) ``` ### WGCNA (co-expression modules) Detailed tutorials of weighted gene correlation network analysis ([WGCNA](https://cran.r-project.org/web/packages/WGCNA/index.html)) [@WGCNA_1; @WGCNA_2] can be found [online](https://github.com/edo98811/WGCNA_official_documentation/). The following is a brief demonstration of how to prepare input and run WGCNA with TiDEomics functions. **Filtering strategies**: - Features with too many missing values will be automatically removed with `WGCNA::goodGenes()` included in the `prepare_WGCNA()` function, and can also be pre-filtered with the calculated feature property (output of `calc_feature_property()` and `summarise_feature_property()`). - Residual variance can be used to exclude noisy features. - It is not recommended to filter by differential expression before WGCNA, see [WGCNA FAQ](https://edo98811.github.io/WGCNA_official_documentation/faq.html). ```{r wgcna-prepare} # Example filtering by residual variance < Q3 var_res_q3 <- quantile(var_decomp$Residual, 0.75, na.rm = TRUE) filter_wgcna <- var_decomp %>% dplyr::filter(Residual < var_res_q3) %>% dplyr::pull(Feature) data_obj_wgcna <- data_obj[filter_wgcna, ] ``` **Prepare data format and choose power**: `prepare_WGCNA()` prepares the input for WGCNA and helps to choose the soft-thresholding power. Check the scale-free topology fit indices to confirm that the chosen power is appropriate. Reasonable powers are less than 15 for unsigned or signed hybrid networks, and less than 30 for signed networks, to reach scale-free topology fit index > 0.8, and mean connectivity (mean.k) not too high (in the hundreds or above). See [WGCNA FAQ](https://edo98811.github.io/WGCNA_official_documentation/faq.html) for details. ```{r choose-power} wgcna_input <- prepare_WGCNA(data_obj_wgcna, assay = 2, powers = seq(1, 30), networkType = "signed", RsquaredCut = 0.8 ) wgcna_input$fitIndices picked_power <- wgcna_input$powerEstimate picked_power ``` **Run WGCNA**: Use `run_WGCNA()` with the output of `prepare_WGCNA()`, and visualise the resulting modules with `plot_WGCNA()`. `run_WGCNA()` is a wrapper of `WGCNA::blockwiseModules()`, which runs the WGCNA analysis and returns the module assignment for each feature. Parameters for `WGCNA::blockwiseModules()` can be set with `run_WGCNA()`, e.g., `corType` for correlation method. ```{r run-wgcna} net <- run_WGCNA(wgcna_input, power = picked_power, # corType = "pearson", # other option is "bicor" numericLabels = TRUE ) plot_WGCNA(net, fontsize = 8) ``` **Extract modules**: show module sizes ```{r extract-wgcna-modules} gene_module <- WGCNA_module(net, exclude_grey = TRUE) gene_module %>% dplyr::group_by(Module) %>% dplyr::summarise(n = dplyr::n()) ``` **Plot module profiles**: ```{r plot-wgcna-modules, fig.width = 6, fig.height = 6} plot_modules_v(gene_module, data_obj_merged, scale = TRUE, ylabel = "Z-score of log2 (raw count + 1)", height_ratio = 2 ) ``` ### Functional enrichment Functions in TiDEomics wrap `clusterProfiler::enrichGO()` and `clusterProfiler::gseGO()`[@clusterProfiler_1; @clusterProfiler_2; @clusterProfiler_3; @clusterProfiler_4] to run GO enrichment efficiently on gene sets and ranked gene list. For ranked gene list (e.g., ranked by Time or Group contribution from variance decomposition), use `enrichGO_rank()` and plot with `enrichplot::gseaplot2()`. ```{r go_rank, fig.width=5, fig.height=3} gse_group <- enrichGO_rank(var_decomp, gene_rank_by = "Group", OrgDb = org.Mm.eg.db, keyType = "SYMBOL", category = "BP", go_rank_by = "p.adjust") enrichplot::gseaplot2(gse_group[["BP"]], geneSetID = 1:3, base_size = 8) ``` For multiple defined gene sets (e.g. WGCNA modules, DE genes by feature classification), use `enrichGO_list()` and plot with `plot_GO()`. Optionally, use `simplify = TRUE` with `enrichGO_list()` to simplify the GO results by removing redundant terms (default: FALSE). ```{r go-example, fig.width = 6} background_wgcna <- colnames(wgcna_input$data) go_list <- enrichGO_list( gene_list = gene_module, OrgDb = org.Mm.eg.db, universe = background_wgcna, pvalueCutoff = 0.9, # get more results for demonstration qvalueCutoff = 0.9, simplify = FALSE, keyType = "SYMBOL" ) plot_GO(go_list$all, plot_dotplot = TRUE, plot_emapplot = FALSE, plot_cnetplot = FALSE, showCategory_dotplot = 3) ``` Enriched GO terms can be added to the module profile plot with `plot_modules_h()`, which is similar to `plot_modules_v()` above but horizontally aligned with annotation of the modules with their top enriched GO terms. The plotting function is adapted from ClusterGVis package [@ClusterGVis]. ```{r plot-wgcna-modules-go, fig.width = 9} plot_modules_h(gene_module, data_obj_merged, scale = TRUE, ylabel = "Z-score of log2 (raw count + 1)", enrich_list = go_list$all, enrich_category = "BP", heatmap_width = 6, heatmap_height = 8 ) ``` Enrichment of other gene sets (e.g., drug signatures, TF targets, pathways) can be performed with `enrichR_list()`, which wraps `enrichR::enrichr()`. It returns a named list of data.frames (one per database) with `Cluster` and `Description` columns, compatible with `plot_modules_h()`. Available databases can be checked with `enrichR::listEnrichrDbs()`. ### Universal: plot features of interest Selected features can be plotted with `plot_trend()`, which shows the mean and standard deviation of replicates at each time point. ```{r plot-feature} plot_trend(data_obj, assay = 1, features = c("Actb", "Cd69", "Fosb", "Il23a"), title = "Example features") plot_trend(data_obj, assay = 2, features = c("Actb", "Cd69", "Fosb", "Il23a"), title = "Example features with time 0 normalisation") # Or pre-calculate mean and sd with calc_mean_sd() # table_mean_sd_orig <- calc_mean_sd(data_obj)$orig # plot_trend(table_mean_sd_orig, features = c("Actb", "Cd69", "Fosb", "Il23a"), # title = "Example features") ``` Visualisation of features with high & low residual variance justify the filtering strategy for WGCNA input. ```{r plot-residual} plot_trend(data_obj, assay = 1, features = var_decomp %>% dplyr::arrange(Residual) %>% head(12) %>% dplyr::pull(Feature), title = "Features with lowest residual variance" ) plot_trend(data_obj, assay = 1, features = var_decomp %>% dplyr::arrange(dplyr::desc(Residual)) %>% head(12) %>% dplyr::pull(Feature), title = "Features with highest residual variance" ) ``` ## Common usage scenarios TiDEomics supports two experimental designs, auto-detected from the sample annotation: - **Cell culture / independent samples**: `create_input(data, sample_ann)` -- each sample is independent. LMM uses `(1|Group) + (1|Time)`. DE uses unpaired designs. - **Patient / repeated-measures**: `create_input(data, sample_ann, subject_col = "PatientID")` -- same subjects tracked across time points. LMM adds `(1|Subject)`. `DE_between_time()` uses paired design via `limma::duplicateCorrelation()`. `normalise_to_start(by_subject = TRUE)` subtracts each subject's own baseline. `merge_replicates()` and `plot_trend()` compute statistics across subjects. Users can ask different biological questions and use different functions in TiDEomics to answer them. For example: - Which features change over time in each sample group? -> `DE_between_time()`, `calc_feature_property()`, `run_Trendy()`. - Which features differ between groups at the same time point? -> `DE_between_group()` (For time 0, use `assay = 1`; for later time points, use `assay = 1` for original input data, or use `assay = 2` for change-from-baseline). - What is the temporal pattern for individual features? -> `run_Trendy()` + `summarise_Trendy()` + `extract_segment_trends()`. - Which features are noisy vs biologically driven? -> `decomp_variance()` + `plot_variance()`, noisiness can be represented by `Residual` variance. - What co-expression modules are present and what are their temporal profiles? -> `prepare_WGCNA()` + `run_WGCNA()` + `plot_modules_v()` or `plot_modules_h()`. - What biological processes are associated with the genes of interest (e.g. co-expression modules, differentially expressed genes in each sample group)? -> `enrichGO_list()` + `plot_GO()`. - What biological processes are associated with the genes ranked by their property (e.g. time or group contributed variance)? -> `enrichGO_rank()`. - How to normalise the data? -> `normalise_to_start()` to focus on changes from baseline; apply global normalisation (quantile, median) and batch correction when appropriate, before `create_input()`. - How to handle missing values? -> `impute_groups()` for `run_Trendy`. `plot_pca()` and `plot_umap()` automatically excluded features with missing values. `prepare_WGCNA()` filter out features with too many missing values. ## Session information ```{r session-info} sessionInfo() ``` ## References