LIPIDIFy is a comprehensive R package designed for the analysis of lipidomics data. It provides tools for:
For biologists and those preferring a graphical interface:
This launches an interactive web application with tabs for each analysis step.
For bioinformaticians and reproducible analyses:
# Load data
data <- load_lipidomics_data("path/to/your/data.csv")
# Classify lipids
classification <- get_lipid_classification(colnames(data$numeric_data))
# Normalize
normalized <- normalize_lipidomics_data(data$numeric_data,
methods = c("TIC", "Log2"))
# Differential analysis
diff_results <- perform_differential_analysis(normalized, data$metadata,
method = "limma")
# Enrichment analysis
enrichment <- perform_enrichment_analysis(diff_results$results,
classification)Your input CSV file should have:
Sample Name (required)Sample Group (required)The package includes example data:
example_data <- generate_example_data()
head(example_data[, 1:8])
#> Sample Name Sample Group Tumour ID Weight (mg) PC 21:1_20:3
#> Sample_1 Sample_1 GroupA 1 44.13763 2997.316
#> Sample_2 Sample_2 GroupA 1 73.50917 4304.851
#> Sample_3 Sample_3 GroupA 2 39.81207 1058.812
#> Sample_4 Sample_4 GroupA 2 37.48295 1772.537
#> Sample_5 Sample_5 GroupA 3 49.59876 3576.774
#> Sample_6 Sample_6 GroupB 3 31.53722 1523.911
#> PC 19:0_16:0 PC 21:2_22:0 PC 22:2_24:1
#> Sample_1 7669.117 1832.0385 3599.9755
#> Sample_2 2019.737 825.7991 4123.0720
#> Sample_3 3242.445 4097.9638 694.2239
#> Sample_4 6193.583 3328.1584 3298.3352
#> Sample_5 2529.651 5679.0787 22619.5831
#> Sample_6 5554.072 18798.9605 2951.1282This generates synthetic data with 4 groups (GroupA-D) and realistic lipid names.
Lipids are automatically classified based on their names:
# Using example data
lipid_names <- colnames(example_data)[5:ncol(example_data)]
classification <- get_lipid_classification(lipid_names[1:10])
print(classification)
#> Lipid LipidGroup LipidType Saturation
#> 1 PC 21:1_20:3 Glycerophospholipids Phosphatidylcholine PUFA
#> 2 PC 19:0_16:0 Glycerophospholipids Phosphatidylcholine SFA
#> 3 PC 21:2_22:0 Glycerophospholipids Phosphatidylcholine PUFA
#> 4 PC 22:2_24:1 Glycerophospholipids Phosphatidylcholine PUFA
#> 5 PC 15:1_19:6 Glycerophospholipids Phosphatidylcholine PUFA
#> 6 PC 21:2_21:4 Glycerophospholipids Phosphatidylcholine PUFA
#> 7 PC 16:1_21:6 Glycerophospholipids Phosphatidylcholine PUFA
#> 8 PC 17:0_23:2 Glycerophospholipids Phosphatidylcholine PUFA
#> 9 PC 20:0_24:6 Glycerophospholipids Phosphatidylcholine PUFA
#> 10 PC 21:1_21:3 Glycerophospholipids Phosphatidylcholine PUFAClassification includes: - LipidGroup: Main category (Glycerophospholipids, Sphingolipids, etc.) - LipidType: Specific class (Phosphatidylcholine, Ceramide, etc.) - Saturation: Fatty acid saturation (SFA, MUFA, PUFA)
You can provide your own classification with flexible hierarchy:
# Sample-wise boxplot
plot <- visualize_raw_data_improved(list(numeric_data = example_data[, 5:ncol(example_data)]),
plot_type = "boxplot",
view_mode = "sample")
print(plot)
# Lipid-wise distribution for top variable lipids
plot2 <- visualize_raw_data_improved(list(numeric_data = example_data[, 5:ncol(example_data)]),
plot_type = "boxplot",
view_mode = "lipid",
top_n = 30)
print(plot2)Visualize specific lipids across samples:
# Prepare data
data_matrix <- as.matrix(example_data[, 5:ncol(example_data)])
rownames(data_matrix) <- example_data$`Sample Name`
metadata <- example_data[, 1:4]
# Create barplot for selected lipids
plots <- create_lipid_expression_barplot(
data_matrix = data_matrix,
metadata = metadata,
selected_lipids = c("PC 16:0_18:1", "PE 18:0_20:4"),
data_type = "raw"
)
# Display plots
if (is.list(plots)) {
print(plots[[1]])
print(plots[[2]])
} else {
print(plots)
}Missing values in lipidomics data arise when a lipid’s signal falls
below the instrument’s limit of detection. LIPIDIFy provides six
imputation strategies via impute_missing_values(). Imputing
after normalisation ensures that imputed values are on
the same scale as observed values.
# Check missingness in normalised data
normalized <- normalize_lipidomics_data(data$numeric_data, methods = c("TIC", "Log2"))
cat("Missing values:", sum(is.na(normalized)), "\n")
# Half-minimum imputation on the normalised matrix
imputed <- impute_missing_values(normalized, method = "half_min")
# KNN imputation (requires Bioconductor impute package)
# imputed <- impute_missing_values(normalized, method = "knn", k = 5)| Method | Best for |
|---|---|
half_min |
MS lipidomics (models LOD on normalised scale) – recommended default |
knn |
Data missing at random; requires impute Bioconductor
package |
zero |
Only when absence of signal truly means zero |
mean / median |
Small proportions of missing values |
Technical batch effects arise from differences in sample preparation
or instrument performance across analytical runs.
correct_batch_effects() removes these effects while
preserving biological signal.
# Assumes your metadata has a "Batch" column
corrected <- correct_batch_effects(
data_matrix = normalized,
metadata = data$metadata,
batch_column = "Batch",
group_column = "Sample Group",
method = "limma" # or "combat" (requires sva package)
)
# Verify with PCA: samples should cluster by biology, not batch
pca_post <- perform_pca(corrected, data$metadata, "Sample Group")
create_pca_plot_with_ellipses(pca_post$pca_data, pca_post$variance_explained,
ellipse_type = "confidence",
title = "PCA after Batch Correction")Multiple normalization methods are available:
# Get list of available methods
methods <- get_normalization_methods()
print(methods)
#> [1] "TIC" "PQN" "Quantile" "Log2Median" "Median"
#> [6] "Mean" "Log2" "Log10" "Sqrt" "None"
# Apply normalization pipeline
normalized <- normalize_lipidomics_data(
data = as.matrix(example_data[, 5:ncol(example_data)]),
methods = c("TIC", "Log2")
)
# Compare different pipelines
pipeline1 <- normalize_lipidomics_data(example_data[, 5:ncol(example_data)],
c("TIC", "Log2"))
pipeline2 <- normalize_lipidomics_data(example_data[, 5:ncol(example_data)],
c("PQN", "Log2"))PCA with optional confidence ellipses:
# Prepare data
metadata <- example_data[, 1:4]
data_matrix <- t(as.matrix(example_data[, 5:ncol(example_data)]))
# Perform PCA
pca_result <- perform_pca(data_matrix, metadata, group_column = "Sample Group")
# Plot without ellipses
plot1 <- create_pca_plot_with_ellipses(
pca_data = pca_result$pca_data,
variance_explained = pca_result$variance_explained,
ellipse_type = "none"
)
# Plot with confidence ellipses (95%)
plot2 <- create_pca_plot_with_ellipses(
pca_data = pca_result$pca_data,
variance_explained = pca_result$variance_explained,
ellipse_type = "confidence",
confidence_level = 0.95
)
# Plot with visual circles
plot3 <- create_pca_plot_with_ellipses(
pca_data = pca_result$pca_data,
variance_explained = pca_result$variance_explained,
ellipse_type = "visual"
)Ellipse Types: - none: Raw points only - confidence: Statistical confidence regions (default 95%) - visual: Simple visual circles to aid group identification
# Perform differential analysis with limma
diff_results_limma <- perform_differential_analysis(
data_matrix = normalized,
metadata = metadata,
group_column = "Sample Group",
contrasts_list = c("GroupB - GroupA", "GroupC - GroupA"),
method = "limma"
)
# View results for first contrast
head(diff_results_limma$results[[1]])# Perform differential analysis with EdgeR
diff_results_edger <- perform_differential_analysis(
data_matrix = normalized,
metadata = metadata,
group_column = "Sample Group",
contrasts_list = c("GroupB - GroupA", "GroupC - GroupA"),
method = "edger"
)
# View results
head(diff_results_edger$results[[1]])Both methods provide similar outputs: - logFC: Log fold
change - P.Value: Raw p-value - adj.P.Val:
Adjusted p-value (FDR) - AveExpr: Average expression
When to use which: - limma: Default choice for lipidomics, fast and robust - EdgeR: Alternative method, originally for count data but adaptable
Specify only the contrasts of interest:
# Create all pairwise contrasts
groups <- c("GroupA", "GroupB", "GroupC", "GroupD")
all_contrasts <- create_default_contrasts(groups)
print(all_contrasts)
#> [1] "GroupB - GroupA" "GroupC - GroupA" "GroupD - GroupA" "GroupC - GroupB"
#> [5] "GroupD - GroupB" "GroupD - GroupC"
# Or specify custom contrasts
custom_contrasts <- c("GroupB - GroupA",
"GroupC - GroupA",
"GroupD - GroupA")Using automatic lipid classification:
Define your own functional sets:
# Create custom sets CSV:
# Lipid,Set_Name
# PC 16:0_18:1,Membrane_Lipids
# PE 18:0_20:4,Membrane_Lipids
# TG 16:0_18:1_20:4,Storage_Lipids
# ...
custom_sets <- load_custom_enrichment_sets("my_sets.csv")
# Run enrichment with custom sets
enrichment_custom <- perform_enrichment_analysis(
results_list = diff_results_limma$results,
classification_data = classification,
custom_sets = custom_sets
)Here’s a complete example workflow:
# 1. Load data
data <- load_lipidomics_data("your_data.csv")
# 2. Classify lipids
classification <- get_lipid_classification(colnames(data$numeric_data))
export_classification(classification, "classification.csv")
# 3. Visualize raw data
raw_plot <- visualize_raw_data_improved(data, plot_type = "boxplot")
# 4. Normalize
normalized <- normalize_lipidomics_data(data$numeric_data,
methods = c("TIC", "Log2"))
# 5. PCA
pca_result <- perform_pca(t(normalized), data$metadata)
pca_plot <- create_pca_plot_with_ellipses(pca_result$pca_data,
pca_result$variance_explained,
ellipse_type = "confidence")
# 6. Differential analysis
diff_results <- perform_differential_analysis(
normalized, data$metadata,
contrasts_list = c("Treatment - Control"),
method = "limma"
)
# 7. Volcano plot
volcano <- create_volcano_plot_labeled(diff_results$results[[1]])
# 8. Enrichment
enrichment <- perform_enrichment_analysis(diff_results$results,
classification)
# 9. Enrichment visualization
enrich_plot <- create_enrichment_dotplot(enrichment[[1]]$LipidGroup)
# 10. Save results
write.csv(diff_results$results[[1]], "differential_results.csv")
write.csv(enrichment[[1]]$LipidGroup, "enrichment_results.csv")If you have batch information, incorporate it into your design:
Export results for use in other software:
# Check for outliers in PCA
pca_result <- perform_pca(t(normalized), data$metadata)
# Samples with PC1 > 3 SD might be outliers
outliers <- pca_result$pca_data[abs(pca_result$pca_data$PC1) >
3 * sd(pca_result$pca_data$PC1), ]
print(outliers)
# Check coefficient of variation
cv <- apply(data$numeric_data, 2, function(x) sd(x) / mean(x) * 100)
high_cv_lipids <- names(cv)[cv > 50]sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] LIPIDIFy_0.99.0 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.3 cli_3.6.6 knitr_1.51
#> [4] rlang_1.2.0 xfun_0.58 stringi_1.8.7
#> [7] otel_0.2.0 jsonlite_2.0.0 glue_1.8.1
#> [10] buildtools_1.0.0 htmltools_0.5.9 maketools_1.3.2
#> [13] sys_3.4.3 sass_0.4.10 rmarkdown_2.31
#> [16] evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [19] yaml_2.3.12 lifecycle_1.0.5 BiocManager_1.30.27
#> [22] stringr_1.6.0 compiler_4.6.0 digest_0.6.39
#> [25] R6_2.6.1 magrittr_2.0.5 bslib_0.11.0
#> [28] tools_4.6.0 cachem_1.1.0The package recognizes standard lipid nomenclature:
Format examples: - PC 16:0_18:1 (individual fatty acids)
- PC 34:1 (total carbon:double bonds) -
LPC 18:2 (single fatty acid) -
TG 16:0_18:1_20:4 (three fatty acids)