| Title: | Comprehensive Lipidomics Data Analysis with Interactive Visualization |
|---|---|
| Description: | Provides a comprehensive toolkit for end-to-end lipidomics data analysis, including missing value imputation, batch effect correction, normalization, differential abundance analysis using limma and edgeR, gene set enrichment analysis, and extensive visualization capabilities. Lipid names are automatically classified by class, subclass, and fatty-acid saturation. Features both an interactive Shiny interface for bench biologists and fully scriptable R functions for bioinformaticians. Supports flexible custom lipid classification schemes and user-defined enrichment sets. |
| Authors: | Fayrouz Hammal [aut, cre] (ORCID: <https://orcid.org/0000-0002-7612-4953>) |
| Maintainer: | Fayrouz Hammal <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.0 |
| Built: | 2026-06-12 02:58:06 UTC |
| Source: | https://github.com/BiocStaging/LIPIDIFy |
Applies normalization methods in the order supplied, passing the output of each step as the input to the next.
apply_normalizations(data, methods)apply_normalizations(data, methods)
data |
Numeric matrix with samples in rows and lipids in columns. |
methods |
Character vector of method names as returned by
|
Normalized numeric matrix of the same dimensions as data.
m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) apply_normalizations(m, c("TIC", "Log2"))m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) apply_normalizations(m, c("TIC", "Log2"))
Generates a complete R Markdown document as a character string.
plot_files is a simple named list of PNG file paths
(raw_plot, norm_plot, pipeline_plot, results_plot, enrichment_plot).
extra_plots is an optional list of additional plot entries
from the session history.
build_report_rmd_with_plots( title, author, sections, raw_data, normalized_data, diff_results, enrichment_results, output_format = "html", plot_files = list(), extra_plots = list() )build_report_rmd_with_plots( title, author, sections, raw_data, normalized_data, diff_results, enrichment_results, output_format = "html", plot_files = list(), extra_plots = list() )
title |
Report title. |
author |
Author name. |
sections |
Character vector of section keys to include. |
raw_data |
Raw data list. |
normalized_data |
Normalized data list. |
diff_results |
Differential analysis results list. |
enrichment_results |
Enrichment analysis results list. |
output_format |
Either |
plot_files |
Named list of plot PNG paths (raw_plot, norm_plot, pipeline_plot, results_plot, enrichment_plot). |
extra_plots |
Optional list of additional plot entries from
session history; each entry has |
A single character string containing the complete Rmd document.
Classifies a vector of lipid names into lipid group, type, and saturation category using regular-expression pattern matching.
classify_lipids(lipid_names)classify_lipids(lipid_names)
lipid_names |
Character vector of lipid names. |
Data frame with columns Lipid, LipidGroup,
LipidType, and Saturation.
classify_lipids(c("PC 16:0_18:1", "TG 16:0_18:1_20:4", "Cer 16:0"))classify_lipids(c("PC 16:0_18:1", "TG 16:0_18:1_20:4", "Cer 16:0"))
Convert List Columns to Strings
convert_list_columns_to_strings(df)convert_list_columns_to_strings(df)
df |
Data frame with potential list columns |
Data frame with list columns converted to strings
Removes known technical batch effects while preserving biological signal. Batch correction should be applied after normalisation.
correct_batch_effects( data_matrix, metadata, batch_column, group_column = "Sample Group", method = "limma" )correct_batch_effects( data_matrix, metadata, batch_column, group_column = "Sample Group", method = "limma" )
data_matrix |
Numeric matrix with samples as rows and lipids as
columns (typically the output of |
metadata |
Data frame of sample metadata aligned with
|
batch_column |
Character. Name of the column in |
group_column |
Character. Name of the biological group column to
protect from removal (default |
method |
One of |
Two methods are supported:
"limma"Uses removeBatchEffect.
Requires only the limma package (already a dependency). Suitable for
most experimental designs.
"combat"Uses sva::ComBat with parametric
empirical Bayes adjustment. Requires the sva Bioconductor
package (BiocManager::install("sva")). Generally more robust
when batch effects are large.
Batch-corrected numeric matrix of the same dimensions as
data_matrix.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) d$metadata$Batch <- rep(c("Batch1", "Batch2"), each = 10) corrected <- correct_batch_effects(norm, d$metadata, batch_column = "Batch") dim(corrected)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) d$metadata$Batch <- rep(c("Batch1", "Batch2"), each = 10) corrected <- correct_batch_effects(norm, d$metadata, batch_column = "Batch") dim(corrected)
Creates default pairwise contrasts from group levels. If the levels contain special characters that are invalid for limma/edgeR, they should already be sanitized before calling this function.
create_default_contrasts(group_levels)create_default_contrasts(group_levels)
group_levels |
Character vector of group level names (e.g., c("Control","Treatment","Resistant")) |
Character vector of limma-style contrast strings (e.g., "Treatment - Control")
create_default_contrasts(c("A", "B", "C"))create_default_contrasts(c("A", "B", "C"))
Create an Enrichment Barplot
create_enrichment_barplot( enrichment_data, title = "Enrichment Analysis", max_pathways = 15 )create_enrichment_barplot( enrichment_data, title = "Enrichment Analysis", max_pathways = 15 )
enrichment_data |
Data frame of fgsea results. |
title |
Plot title. |
max_pathways |
Maximum number of pathways (top by p-value). |
A ggplot2 object.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) p <- create_enrichment_barplot(enrich[[1]][["LipidGroup"]]) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) p <- create_enrichment_barplot(enrich[[1]][["LipidGroup"]]) print(p)
Create an Enrichment Dotplot
create_enrichment_dotplot( enrichment_data, title = "Enrichment Analysis", max_pathways = 15 )create_enrichment_dotplot( enrichment_data, title = "Enrichment Analysis", max_pathways = 15 )
enrichment_data |
Data frame of fgsea results. |
title |
Plot title. |
max_pathways |
Maximum number of pathways (top by p-value). |
A ggplot2 object.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) p <- create_enrichment_dotplot(enrich[[1]][["LipidGroup"]]) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) p <- create_enrichment_dotplot(enrich[[1]][["LipidGroup"]]) print(p)
Create a Robust Heatmap of Top Variable Features
create_heatmap_robust( data_matrix, metadata, group_column = "Sample Group", top_n = 50, classification_data = NULL, title = "Heatmap" )create_heatmap_robust( data_matrix, metadata, group_column = "Sample Group", top_n = 50, classification_data = NULL, title = "Heatmap" )
data_matrix |
Numeric matrix (features as rows, samples as columns). |
metadata |
Metadata data frame (samples as rows). |
group_column |
Name of the group column in |
top_n |
Maximum number of features to display. |
classification_data |
Optional classification data frame for row
annotation (must have a |
title |
Heatmap title. |
A pheatmap object, or a ggplot2 error plot.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) create_heatmap_robust(t(norm), d$metadata, "Sample Group", top_n = 10)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) create_heatmap_robust(t(norm), d$metadata, "Sample Group", top_n = 10)
Produces per-lipid barplots coloured by sample group. Samples are automatically sorted by group (then alphabetically within group) for a cleaner visual.
create_lipid_expression_barplot( data_matrix, metadata, selected_lipids, selected_samples = NULL, selected_groups = NULL, group_column = "Sample Group", data_type = "normalized" )create_lipid_expression_barplot( data_matrix, metadata, selected_lipids, selected_samples = NULL, selected_groups = NULL, group_column = "Sample Group", data_type = "normalized" )
data_matrix |
Numeric matrix (samples as rows, lipids as columns). |
metadata |
Metadata data frame. |
selected_lipids |
Character vector of lipid names to plot. |
selected_samples |
Optional character vector of sample names to retain. |
selected_groups |
Optional character vector of group names to retain. |
group_column |
Name of the group column in |
data_type |
Label for the y-axis subtitle ("raw" or "normalized"). |
A single ggplot2 object (one lipid) or a named list of
ggplot2 objects (multiple lipids).
d <- load_lipidomics_data_from_df(generate_example_data()) p <- create_lipid_expression_barplot( d$numeric_data, d$metadata, selected_lipids = colnames(d$numeric_data)[1], group_column = "Sample Group" ) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) p <- create_lipid_expression_barplot( d$numeric_data, d$metadata, selected_lipids = colnames(d$numeric_data)[1], group_column = "Sample Group" ) print(p)
Create Pathway Sets
create_pathway_sets(merged_data, classification_column)create_pathway_sets(merged_data, classification_column)
merged_data |
Data frame with lipids and classifications |
classification_column |
Column name for classification |
Named list of pathway sets
The colour/fill legends are merged so that ellipses do not introduce duplicate legend keys. Sample labels are kept separate from group labels.
create_pca_plot_with_ellipses( pca_data, variance_explained, ellipse_type = "none", confidence_level = 0.95, title = "PCA Analysis", show_sample_labels = FALSE )create_pca_plot_with_ellipses( pca_data, variance_explained, ellipse_type = "none", confidence_level = 0.95, title = "PCA Analysis", show_sample_labels = FALSE )
pca_data |
Data frame with columns |
variance_explained |
Numeric vector of length |
ellipse_type |
One of |
confidence_level |
Numeric confidence level for |
title |
Plot title. |
show_sample_labels |
Logical. If |
A ggplot2 object.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) pca_res <- perform_pca(norm, d$metadata, "Sample Group") p <- create_pca_plot_with_ellipses(pca_res$pca_data, pca_res$variance_explained) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) pca_res <- perform_pca(norm, d$metadata, "Sample Group") p <- create_pca_plot_with_ellipses(pca_res$pca_data, pca_res$variance_explained) print(p)
Quick QC Plot for a Normalized Data Matrix
create_pipeline_plot( data_matrix, title = "Normalization Pipeline", metadata = NULL, group_column = "Sample Group", plot_type = "boxplot" )create_pipeline_plot( data_matrix, title = "Normalization Pipeline", metadata = NULL, group_column = "Sample Group", plot_type = "boxplot" )
data_matrix |
Numeric matrix or data frame (samples as rows, lipids as columns). |
title |
Plot title string. |
metadata |
Optional metadata data frame (same row order) for group colouring. |
group_column |
Name of the group column in |
plot_type |
One of |
A ggplot2 object.
m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) p <- create_pipeline_plot(m, title = "Test Pipeline") print(p)m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) p <- create_pipeline_plot(m, title = "Test Pipeline") print(p)
Create a PLS-DA Plot with Optional Ellipses
create_plsda_plot_with_ellipses( plsda_data, ellipse_type = "none", confidence_level = 0.95, title = "PLS-DA Analysis", show_sample_labels = FALSE )create_plsda_plot_with_ellipses( plsda_data, ellipse_type = "none", confidence_level = 0.95, title = "PLS-DA Analysis", show_sample_labels = FALSE )
plsda_data |
Data frame with columns |
ellipse_type |
One of |
confidence_level |
Numeric confidence level (default |
title |
Plot title. |
show_sample_labels |
Logical. Show sample name labels if |
A ggplot2 object.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_plsda(norm, d$metadata, "Sample Group") p <- create_plsda_plot_with_ellipses(res$scores_data) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_plsda(norm, d$metadata, "Sample Group") p <- create_plsda_plot_with_ellipses(res$scores_data) print(p)
Significant lipids (adj.P.Val < pval_threshold AND
|logFC| > logfc_threshold) are coloured; non-significant points
are grey. When classification_data is supplied, significant lipids
are coloured by the selected classification column.
create_volcano_plot_labeled( results, title = "Volcano Plot", logfc_threshold = 1, pval_threshold = 0.05, top_labels = 15, classification_data = NULL, color_by = NULL )create_volcano_plot_labeled( results, title = "Volcano Plot", logfc_threshold = 1, pval_threshold = 0.05, top_labels = 15, classification_data = NULL, color_by = NULL )
results |
Data frame of differential analysis results (must have
columns |
title |
Plot title. |
logfc_threshold |
Absolute log-fold-change threshold. |
pval_threshold |
Adjusted p-value threshold. |
top_labels |
Number of top significant lipids to label. |
classification_data |
Optional classification data frame with a
|
color_by |
Column in |
A ggplot2 object.
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) p <- create_volcano_plot_labeled(res$results[[1]]) print(p)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) p <- create_volcano_plot_labeled(res$results[[1]]) print(p)
Parses standard lipid name notation to count double bonds and classifies the species as SFA (0 double bonds), MUFA (1) or PUFA (>1).
determine_saturation(lipid_name)determine_saturation(lipid_name)
lipid_name |
A single character string with the lipid name. |
One of "SFA", "MUFA", "PUFA", or
"Unclassified".
determine_saturation("PC 16:0_18:1") # "MUFA" determine_saturation("PE 18:0_18:0") # "SFA" determine_saturation("TG 16:0_18:1_20:4") # "PUFA"determine_saturation("PC 16:0_18:1") # "MUFA" determine_saturation("PE 18:0_18:0") # "SFA" determine_saturation("TG 16:0_18:1_20:4") # "PUFA"
Convenience helper that generates a small synthetic lipidomics dataset for examples and vignettes.
example_lipidomics_data()example_lipidomics_data()
A data.frame as returned by generate_example_data().
df <- example_lipidomics_data() nrow(df)df <- example_lipidomics_data() nrow(df)
Export Lipid Classification to CSV
export_classification(classification, file_path)export_classification(classification, file_path)
classification |
Data frame with lipid classifications. |
file_path |
Output file path. |
Invisibly returns TRUE on success.
cls <- classify_lipids(c("PC 16:0_18:1", "PE 18:0_20:4")) tmp <- tempfile(fileext = ".csv") export_classification(cls, tmp) unlink(tmp)cls <- classify_lipids(c("PC 16:0_18:1", "PE 18:0_20:4")) tmp <- tempfile(fileext = ".csv") export_classification(cls, tmp) unlink(tmp)
Align Samples Between Data Matrix and Metadata
fix_sample_alignment(data_matrix, metadata)fix_sample_alignment(data_matrix, metadata)
data_matrix |
Numeric matrix (features as rows, samples as columns). |
metadata |
Metadata data frame. |
Named list with aligned data_matrix and metadata.
d <- load_lipidomics_data_from_df(generate_example_data()) aligned <- fix_sample_alignment(t(d$numeric_data), d$metadata) names(aligned)d <- load_lipidomics_data_from_df(generate_example_data()) aligned <- fix_sample_alignment(t(d$numeric_data), d$metadata) names(aligned)
Creates a synthetic lipidomics dataset with realistic lipid names and group differences suitable for demonstrating differential analysis capabilities.
generate_example_data()generate_example_data()
Data frame with simulated lipidomics data including 4 groups with 5 replicates each
example_data <- generate_example_data() head(example_data[, 1:10])example_data <- generate_example_data() head(example_data[, 1:10])
Return Human-Readable Descriptions of Imputation Methods
get_imputation_descriptions()get_imputation_descriptions()
Named character vector (name = method key, value = description).
descs <- get_imputation_descriptions() cat(descs["half_min"])descs <- get_imputation_descriptions() cat(descs["half_min"])
Return Available Imputation Method Names
get_imputation_methods()get_imputation_methods()
Character vector of imputation method names supported by
impute_missing_values.
get_imputation_methods()get_imputation_methods()
Wrapper around classify_lipids for convenient scripting use.
get_lipid_classification(lipid_names)get_lipid_classification(lipid_names)
lipid_names |
Character vector of lipid names. |
Data frame with columns Lipid, LipidGroup,
LipidType, and Saturation.
get_lipid_classification(c("PC 16:0_18:1", "PE 18:0_20:4"))get_lipid_classification(c("PC 16:0_18:1", "PE 18:0_20:4"))
Used by the Shiny app to populate help text.
get_normalization_descriptions()get_normalization_descriptions()
Named character vector (name = method key, value = description).
descs <- get_normalization_descriptions() cat(descs["TIC"])descs <- get_normalization_descriptions() cat(descs["TIC"])
Return Available Normalization Method Names
get_normalization_methods()get_normalization_methods()
Character vector of normalization method names supported by
apply_normalizations.
get_normalization_methods()get_normalization_methods()
Replaces NA values using the chosen strategy. Imputation should be
applied before normalisation so that missing values do not bias
per-sample scaling factors.
impute_missing_values(data_matrix, method = "half_min", k = 5L, seed = 42L)impute_missing_values(data_matrix, method = "half_min", k = 5L, seed = 42L)
data_matrix |
Numeric matrix with samples as rows and lipids as columns. |
method |
Imputation method. One of |
k |
Integer. Number of nearest neighbours for |
seed |
Integer. Random seed for reproducibility when |
Imputed numeric matrix of the same dimensions as
data_matrix.
m <- matrix(c(1000, NA, 3000, NA, 500, 1500), nrow = 2) impute_missing_values(m, method = "half_min")m <- matrix(c(1000, NA, 3000, NA, 500, 1500), nrow = 2) impute_missing_values(m, method = "half_min")
Starts an interactive Shiny dashboard for end-to-end lipidomics analysis, including data upload, lipid classification, normalization, differential analysis, enrichment analysis, and report generation.
launch_lipidomics_app(port = NULL)launch_lipidomics_app(port = NULL)
port |
Integer. Port number for the Shiny server. |
Launches the Shiny application (does not return a value).
if (interactive()) { launch_lipidomics_app() }if (interactive()) { launch_lipidomics_app() }
Load Custom Lipid Classification from a CSV File
load_custom_classification(file_path)load_custom_classification(file_path)
file_path |
Path to a CSV file with a |
Data frame with Lipid as the first column.
tmp <- tempfile(fileext = ".csv") write.csv( data.frame( Lipid = c("PC 16:0_18:1", "PE 18:0"), Class = c("Phospholipid", "Phospholipid") ), tmp, row.names = FALSE ) cls <- load_custom_classification(tmp) unlink(tmp)tmp <- tempfile(fileext = ".csv") write.csv( data.frame( Lipid = c("PC 16:0_18:1", "PE 18:0"), Class = c("Phospholipid", "Phospholipid") ), tmp, row.names = FALSE ) cls <- load_custom_classification(tmp) unlink(tmp)
The CSV must have columns Lipid and Set_Name.
A lipid may appear in multiple rows to belong to multiple sets.
load_custom_enrichment_sets(file_path)load_custom_enrichment_sets(file_path)
file_path |
Path to the CSV file. |
Named list of character vectors (one vector per set).
tmp <- tempfile(fileext = ".csv") write.csv( data.frame( Lipid = c("PC 16:0_18:1", "PE 18:0"), Set_Name = c("Phospholipids", "Phospholipids") ), tmp, row.names = FALSE ) sets <- load_custom_enrichment_sets(tmp) unlink(tmp)tmp <- tempfile(fileext = ".csv") write.csv( data.frame( Lipid = c("PC 16:0_18:1", "PE 18:0"), Set_Name = c("Phospholipids", "Phospholipids") ), tmp, row.names = FALSE ) sets <- load_custom_enrichment_sets(tmp) unlink(tmp)
Reads a CSV file and separates metadata columns from numeric lipid abundance columns.
load_lipidomics_data( file_path, metadata_columns = c("Sample Name", "Sample Group", "Tumour ID", "Weight (mg)") )load_lipidomics_data( file_path, metadata_columns = c("Sample Name", "Sample Group", "Tumour ID", "Weight (mg)") )
file_path |
Path to the lipidomics data file (CSV format). |
metadata_columns |
Character vector of expected metadata column names. |
A named list with components: data (original data frame),
metadata (metadata data frame), numeric_data (numeric matrix).
# Write a minimal CSV then load it tmp <- tempfile(fileext = ".csv") write.csv( data.frame( "Sample Name" = c("S1", "S2"), "Sample Group" = c("A", "B"), "PC 16:0" = c(1000, 2000), check.names = FALSE ), tmp, row.names = FALSE ) loaded <- load_lipidomics_data(tmp) dim(loaded$numeric_data) unlink(tmp)# Write a minimal CSV then load it tmp <- tempfile(fileext = ".csv") write.csv( data.frame( "Sample Name" = c("S1", "S2"), "Sample Group" = c("A", "B"), "PC 16:0" = c(1000, 2000), check.names = FALSE ), tmp, row.names = FALSE ) loaded <- load_lipidomics_data(tmp) dim(loaded$numeric_data) unlink(tmp)
Processes a lipidomics data frame by separating metadata and numeric data columns.
load_lipidomics_data_from_df( data_df, metadata_columns = c("Sample Name", "Sample Group", "Tumour ID", "Weight (mg)") )load_lipidomics_data_from_df( data_df, metadata_columns = c("Sample Name", "Sample Group", "Tumour ID", "Weight (mg)") )
data_df |
Data frame with lipidomics data |
metadata_columns |
Vector of metadata column names |
List containing data components (data, metadata, numeric_data)
data_df <- generate_example_data() loaded_data <- load_lipidomics_data_from_df(data_df) names(loaded_data)data_df <- generate_example_data() loaded_data <- load_lipidomics_data_from_df(data_df) names(loaded_data)
Convenience wrapper around apply_normalizations.
normalize_lipidomics_data(data, methods = c("TIC", "Log2"))normalize_lipidomics_data(data, methods = c("TIC", "Log2"))
data |
Numeric matrix (samples as rows, lipids as columns). |
methods |
Character vector of normalization method names. |
Normalized numeric matrix.
m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) normalize_lipidomics_data(m, c("TIC", "Log2"))m <- matrix(rlnorm(60, 8, 1), nrow = 6, ncol = 10) normalize_lipidomics_data(m, c("TIC", "Log2"))
Log2-transforms the data (log2(x + 1)), then median-centres each sample relative to the global median. This is a simplified variance-stabilising step suitable for mass-spectrometry lipidomics data.
normalize_log2median(data) normalize_vsn(data)normalize_log2median(data) normalize_vsn(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
This method is not equivalent to the full VSN procedure of Huber et al. (2002), which uses maximum-likelihood estimation. If true VSN is required, use the vsn Bioconductor package directly.
Log2-median-centred numeric matrix.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_log2median(m) m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_vsn(m) # deprecated aliasm <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_log2median(m) m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_vsn(m) # deprecated alias
Scales each sample so its mean equals the global mean across all samples.
normalize_mean(data)normalize_mean(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
Mean-normalized matrix.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_mean(m)m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_mean(m)
Scales each sample so its median equals the global median across all samples.
normalize_median(data)normalize_median(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
Median-normalized matrix.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_median(m)m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_median(m)
Probabilistic Quotient Normalization. Uses the per-feature median across samples as the reference spectrum.
normalize_pqn(data)normalize_pqn(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
PQN-normalized matrix.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_pqn(m)m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_pqn(m)
Forces all samples to share an identical intensity distribution. After this step per-sample boxplots will look nearly identical – that is the intended and correct behaviour of quantile normalization.
normalize_quantile(data)normalize_quantile(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
Quantile-normalized matrix of the same dimensions.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_quantile(m)m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_quantile(m)
Divides each sample by its total ion current (row sum) and rescales to the global mean TIC.
normalize_tic(data)normalize_tic(data)
data |
Numeric matrix (samples as rows, lipids as columns). |
TIC-normalized matrix of the same dimensions.
m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_tic(m)m <- matrix(c(1000, 2000, 3000, 4000, 500, 1500), nrow = 2) normalize_tic(m)
Perform Differential Analysis
perform_differential_analysis( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL, method = "limma" )perform_differential_analysis( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL, method = "limma" )
data_matrix |
Normalized data matrix (features as rows, samples as columns) |
metadata |
Metadata data frame |
group_column |
Column name in metadata containing group information |
contrasts_list |
List of contrasts to perform |
method |
Method to use: "limma" or "edger" |
List containing results for each contrast
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) names(res)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) names(res)
Perform Differential Analysis with EdgeR
perform_differential_analysis_edger( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL )perform_differential_analysis_edger( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL )
data_matrix |
Normalized data matrix (features as rows, samples as columns) |
metadata |
Metadata data frame |
group_column |
Column name in metadata containing group information |
contrasts_list |
List of contrasts to perform |
List containing EdgeR results for each contrast
Perform Differential Analysis with limma
perform_differential_analysis_limma( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL )perform_differential_analysis_limma( data_matrix, metadata, group_column = "Sample Group", contrasts_list = NULL )
data_matrix |
Normalized data matrix (features as rows, samples as columns) |
metadata |
Metadata data frame |
group_column |
Column name in metadata containing group information |
contrasts_list |
List of contrasts to perform |
List containing LIMMA results for each contrast
Perform Enrichment Analysis
perform_enrichment_analysis( results_list, classification_data, min_set_size = 5, max_set_size = 500, custom_sets = NULL )perform_enrichment_analysis( results_list, classification_data, min_set_size = 5, max_set_size = 500, custom_sets = NULL )
results_list |
List of differential analysis results |
classification_data |
Lipid classification data frame |
min_set_size |
Minimum pathway set size |
max_set_size |
Maximum pathway set size |
custom_sets |
Optional named list of custom lipid sets |
List containing GSEA results
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) names(enrich)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) cls <- classify_lipids(colnames(norm)) res <- perform_differential_analysis(norm, d$metadata, "Sample Group", contrasts_list = NULL, method = "limma" ) enrich <- perform_enrichment_analysis(res$results, cls, min_set_size = 3) names(enrich)
Perform PCA Analysis
perform_pca(data_matrix, metadata, group_column = "Sample Group")perform_pca(data_matrix, metadata, group_column = "Sample Group")
data_matrix |
Data matrix (samples as rows) |
metadata |
Metadata data frame |
group_column |
Group column name |
List containing PCA results and plot
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) pca_res <- perform_pca(norm, d$metadata, "Sample Group") names(pca_res)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) pca_res <- perform_pca(norm, d$metadata, "Sample Group") names(pca_res)
Perform PLS-DA Analysis (FIXED VERSION)
perform_plsda(data_matrix, metadata, group_column = "Sample Group", n_comp = 2)perform_plsda(data_matrix, metadata, group_column = "Sample Group", n_comp = 2)
data_matrix |
Data matrix (samples as rows) |
metadata |
Metadata data frame |
group_column |
Group column name |
n_comp |
Number of components |
List containing PLS-DA results and plot
d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_plsda(norm, d$metadata, "Sample Group") names(res)d <- load_lipidomics_data_from_df(generate_example_data()) norm <- apply_normalizations(d$numeric_data, c("TIC", "Log2")) res <- perform_plsda(norm, d$metadata, "Sample Group") names(res)
Run FGSEA (original function kept for compatibility)
run_fgsea(pathway_sets, ranked_vector, min_size, max_size)run_fgsea(pathway_sets, ranked_vector, min_size, max_size)
pathway_sets |
Named list of pathway sets |
ranked_vector |
Named numeric vector of ranked statistics |
min_size |
Minimum set size |
max_size |
Maximum set size |
FGSEA results data frame
Run FGSEA with Error Handling
run_fgsea_safe(pathway_sets, ranked_vector, min_size, max_size)run_fgsea_safe(pathway_sets, ranked_vector, min_size, max_size)
pathway_sets |
Named list of pathway sets |
ranked_vector |
Named numeric vector of ranked statistics |
min_size |
Minimum set size |
max_size |
Maximum set size |
FGSEA results data frame or NULL if error
Convenience function to verify saturation detection on a set of lipid names.
test_saturation_classification(test_lipids = NULL)test_saturation_classification(test_lipids = NULL)
test_lipids |
Optional character vector of lipid names to test.
If |
A data frame with columns Lipid and Saturation,
printed to the console and returned invisibly.
test_saturation_classification()test_saturation_classification()
Produces a simple per-sample boxplot, density, or histogram of raw lipidomics intensities.
visualize_raw_data(data_list, plot_type = "boxplot")visualize_raw_data(data_list, plot_type = "boxplot")
data_list |
List as returned by |
plot_type |
One of |
A ggplot2 object.
dl <- load_lipidomics_data_from_df(generate_example_data()) p <- visualize_raw_data(dl, "boxplot") print(p)dl <- load_lipidomics_data_from_df(generate_example_data()) p <- visualize_raw_data(dl, "boxplot") print(p)
Extended visualization that can show data from the sample perspective (one boxplot/violin/density per sample) or the lipid perspective (top variable lipids).
visualize_raw_data_improved( data_list, plot_type = "boxplot", view_mode = "sample", top_n = 30, metadata = NULL, group_column = "Sample Group" )visualize_raw_data_improved( data_list, plot_type = "boxplot", view_mode = "sample", top_n = 30, metadata = NULL, group_column = "Sample Group" )
data_list |
List with |
plot_type |
One of |
view_mode |
Either |
top_n |
Integer. Number of top-variable lipids to show in lipid mode. |
metadata |
Optional metadata data frame (same row order as
|
group_column |
Name of the group column in |
A ggplot2 object.
dl <- load_lipidomics_data_from_df(generate_example_data()) p <- visualize_raw_data_improved(dl, "boxplot", "sample") print(p)dl <- load_lipidomics_data_from_df(generate_example_data()) p <- visualize_raw_data_improved(dl, "boxplot", "sample") print(p)