| Title: | Integrative Multi-Omics Analysis of Host Transcriptomics and Gut Microbiome Data |
|---|---|
| Description: | MultiOmicsBridge provides an end-to-end, reproducible computational framework for integrative analysis of paired host transcriptomics (bulk RNA-seq) and gut microbiome (16S rRNA or shotgun metagenomics) data. The package addresses the lack of a unified Bioconductor workflow for this pairing by implementing five modules: (1) data harmonization and normalization with CLR transformation for microbiome compositional data and TMM/voom for RNA-seq; (2) joint dimensionality reduction via sparse multi-block PLS-DA (DIABLO); (3) multi-omics biomarker discovery through cross-omics correlation networks and sparse feature loadings; (4) integrated diagnostic classification comparing host-only, microbiome-only, and joint Random Forest models with nested cross-validation; and (5) publication-quality visualization of integration results, biomarker networks, classifier comparisons, and feature flow diagrams. All functions operate natively on SummarizedExperiment and MultiAssayExperiment objects and return a structured MOBResult S4 object. The package is validated on inflammatory bowel disease multi-omics data and designed with complex disease contexts (tuberculosis, HIV, EED) in mind. |
| Authors: | Subhadip Jana [aut, cre, fnd] (ORCID: <https://orcid.org/0009-0003-7860-2853>) |
| Maintainer: | Subhadip Jana <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.0 |
| Built: | 2026-06-09 13:49:41 UTC |
| Source: | https://github.com/BiocStaging/MultiOmicsBridge |
MultiOmicsBridge provides an end-to-end, reproducible computational framework for integrative analysis of paired host transcriptomics (bulk RNA-seq) and gut microbiome (16S rRNA or shotgun metagenomics) data. The package bridges two complementary biological data layers that, when analyzed together, reveal insights neither can provide alone.
No single Bioconductor package provides all four critical capabilities for host-microbiome integration in a unified, microbiome-aware workflow:
Proper normalization for compositional microbiome data alongside RNA-seq count data.
Joint dimensionality reduction identifying shared variation patterns between the two data layers.
Multi-omics biomarker selection identifying which host genes and which microbial taxa jointly predict disease status.
An integrated diagnostic classifier demonstrating the added value of combining both data types over either alone.
MultiOmicsBridge provides all four in a single, opinionated pipeline.
loadHostData, loadMicrobiomeData,
matchSamples: Import, normalize, and match paired
omics data into a MultiAssayExperiment.
jointDimReduction: Sparse multi-block PLS-DA
(DIABLO) identifying correlated features across data blocks.
biomarkerDiscovery: Sparse feature loadings and
cross-omics correlation networks for multi-omics biomarker ranking.
diagnosticClassifier: Host-only, microbiome-only,
and joint Random Forest classifiers with nested cross-validation.
plotIntegration, plotBiomarkerNetwork,
plotClassifierComparison, plotSankey,
generateReport.
loadHostDataImport and voom-normalize bulk RNA-seq count data.
loadMicrobiomeDataImport and CLR-transform microbiome taxa table data.
matchSamplesMatch paired samples across omics
layers into a MultiAssayExperiment.
jointDimReductionRun DIABLO joint dimensionality reduction.
biomarkerDiscoveryIdentify ranked multi-omics biomarkers.
diagnosticClassifierTrain and compare single-omics and joint diagnostic classifiers.
MultiOmicsBridgeAnalysisOne-call wrapper for the complete analysis pipeline.
The package is designed as a generalized framework and validated on tuberculosis, HIV antiretroviral therapy, and inflammatory bowel disease datasets. By providing a standardized, accessible workflow, MultiOmicsBridge lowers the barrier to multi-omics integration for researchers working across various disease contexts.
Maintainer: Subhadip Jana [email protected] (ORCID) [funder]
Authors:
Subhadip Jana [email protected] (ORCID) [funder]
Rohart F et al. (2017). mixOmics: An R package for 'omics feature selection and multiple data integration. PLoS Comput Biol, 13(11), e1005752.
Reel PS et al. (2021). Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv, 49, 107739.
Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology, 4, 293-305.
Useful links:
Report bugs at https://github.com/SubhadipJana1409/MultiOmicsBridge/issues
Identifies and ranks host genes and microbial taxa that jointly predict the outcome of interest using two complementary evidence streams: (1) sparse feature loadings from the DIABLO joint dimensionality reduction model, and (2) a cross-omics Spearman correlation network linking host genes to microbial taxa. Features are ranked by their combined loading score and annotated with their maximum cross-omics correlation.
biomarkerDiscovery( mae, dr_result, n_biomarkers = 50L, host_assay = "voom", mb_assay = "CLR" )biomarkerDiscovery( mae, dr_result, n_biomarkers = 50L, host_assay = "voom", mb_assay = "CLR" )
mae |
A |
dr_result |
A named |
n_biomarkers |
An |
host_assay |
A |
mb_assay |
A |
The biomarker ranking combines:
The L2 norm of a feature's loadings across all DIABLO components. Genes/taxa with higher loading scores contribute more strongly to the latent integration axes.
For each selected host gene, the maximum absolute Spearman correlation with any selected microbial taxon (and vice versa). High cross-omics correlation indicates biologically relevant host-microbe co-variation.
Hub features — those with both high loading scores and high cross-omics correlations — represent the most credible multi-omics biomarker candidates.
A DataFrame with one row per biomarker and columns:
featureFeature name (gene or taxon ID).
omics_layerEither "host" or
"microbiome".
loading_scoreL2 norm of DIABLO loadings across components.
rankWithin-layer ranking by loading score.
componentDIABLO component with highest absolute loading.
max_cross_corMaximum absolute Spearman correlation with a feature from the other omics layer.
top_partnerName of the cross-omics feature with the highest absolute correlation.
jointDimReduction, plotBiomarkerNetwork,
MultiOmicsBridgeAnalysis
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) bm <- biomarkerDiscovery(mae, dr_res, n_biomarkers = 20) head(as.data.frame(bm))set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) bm <- biomarkerDiscovery(mae, dr_res, n_biomarkers = 20) head(as.data.frame(bm))
Returns the ranked multi-omics biomarker DataFrame
from a MOBResult object.
biomarkers(x, ...) ## S4 method for signature 'MOBResult' biomarkers(x, ...)biomarkers(x, ...) ## S4 method for signature 'MOBResult' biomarkers(x, ...)
x |
A |
... |
Additional arguments (not used). |
A DataFrame with columns feature,
omics_layer, loading_score, rank, and
component.
library(S4Vectors) bm <- DataFrame(feature = c("G1","T1"), omics_layer = c("host","microbiome"), loading_score = c(0.8,0.6), rank = c(1L,2L), component = c(1L,1L)) obj <- MOBResult(matrix(rnorm(20), 10, 2), list(), bm, list()) biomarkers(obj)library(S4Vectors) bm <- DataFrame(feature = c("G1","T1"), omics_layer = c("host","microbiome"), loading_score = c(0.8,0.6), rank = c(1L,2L), component = c(1L,1L)) obj <- MOBResult(matrix(rnorm(20), 10, 2), list(), bm, list()) biomarkers(obj)
Trains and evaluates three Random Forest diagnostic classifiers using
cross-validation: a host-only model, a microbiome-only model, and a
joint multi-omics model. By comparing AUC-ROC across all three
configurations, diagnosticClassifier quantifies the added
diagnostic value of combining both data types.
diagnosticClassifier( mae, outcome, biomarker_table = NULL, cv_folds = 5L, n_trees = 500L, seed = 42L, host_assay = "voom", mb_assay = "CLR" )diagnosticClassifier( mae, outcome, biomarker_table = NULL, cv_folds = 5L, n_trees = 500L, seed = 42L, host_assay = "voom", mb_assay = "CLR" )
mae |
A |
outcome |
A |
biomarker_table |
An optional |
cv_folds |
An |
n_trees |
An |
seed |
An |
host_assay |
A |
mb_assay |
A |
Each classifier is trained using ranger::ranger (fast C++
Random Forest) with 500 trees and stratified k-fold cross-validation.
Features are drawn from the top biomarkers identified by
biomarkerDiscovery (or all available features if
biomarker_table is NULL).
The cross-validation procedure:
Split samples into cv_folds stratified folds (each
fold preserves the outcome class ratio).
For each fold, train on the remaining folds, predict on the held-out fold.
Compute AUC-ROC on held-out predictions.
Report mean +/- SD AUC across folds.
A named list with elements:
host_onlyList with mean_auc, sd_auc,
fold_auc, roc_data for the host-only classifier.
microbiome_onlySame structure for the microbiome-only classifier.
jointSame structure for the joint classifier.
n_featuresNamed integer: host,
microbiome, joint feature counts.
cv_foldsNumber of CV folds used.
outcome_levelsOutcome levels.
biomarkerDiscovery, plotClassifierComparison,
MultiOmicsBridgeAnalysis
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) bm <- biomarkerDiscovery(mae, dr_res, n_biomarkers = 20) clf_res <- diagnosticClassifier(mae, outcome, biomarker_table = bm, cv_folds = 3) clf_res$host_only$mean_auc clf_res$joint$mean_aucset.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) bm <- biomarkerDiscovery(mae, dr_res, n_biomarkers = 20) clf_res <- diagnosticClassifier(mae, outcome, biomarker_table = bm, cv_folds = 3) clf_res$host_only$mean_auc clf_res$joint$mean_auc
Returns the named list of per-layer feature loading
matrices from a MOBResult object.
featureLoadings(x, ...) ## S4 method for signature 'MOBResult' featureLoadings(x, ...)featureLoadings(x, ...) ## S4 method for signature 'MOBResult' featureLoadings(x, ...)
x |
A |
... |
Additional arguments (not used). |
A named list with elements host and
microbiome, each a matrix of genes/taxa x components.
library(S4Vectors) fl <- list(host = matrix(rnorm(10), 5, 2), microbiome = matrix(rnorm(6), 3, 2)) obj <- MOBResult(matrix(rnorm(20), 10, 2), fl, DataFrame(), list()) featureLoadings(obj)library(S4Vectors) fl <- list(host = matrix(rnorm(10), 5, 2), microbiome = matrix(rnorm(6), 3, 2)) obj <- MOBResult(matrix(rnorm(20), 10, 2), fl, DataFrame(), list()) featureLoadings(obj)
Prints a formatted text summary of a MOBResult object
to the console and optionally saves it to a plain-text file. The
report covers all five analysis modules: data dimensions, integration
method, top biomarkers, cross-omics correlations, and classifier
performance comparison.
For a full interactive HTML report, users can render the package
vignette template (system.file("vignettes", package =
"MultiOmicsBridge")) with rmarkdown::render() using their
own MOBResult object.
generateReport(result, file = NULL, n_top = 10L)generateReport(result, file = NULL, n_top = 10L)
result |
A |
file |
An optional |
n_top |
An |
Invisibly returns a named list of character vectors,
one per report section.
MultiOmicsBridgeAnalysis, MOBResult
library(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1", "Comp2"))) bm <- DataFrame( feature = c("Gene1", "Taxon1", "Gene2"), omics_layer = c("host", "microbiome", "host"), loading_score = c(0.9, 0.8, 0.7), rank = c(1L, 2L, 3L), component = c(1L, 1L, 1L) ) cr <- list( host_only = list(mean_auc = 0.82, sd_auc = 0.05, fold_auc = c(0.78, 0.84, 0.83)), microbiome_only = list(mean_auc = 0.75, sd_auc = 0.06, fold_auc = c(0.70, 0.79, 0.76)), joint = list(mean_auc = 0.94, sd_auc = 0.03, fold_auc = c(0.92, 0.95, 0.94)), cv_folds = 3L, outcome_levels = c("ctrl", "treat") ) obj <- MOBResult(scores, list(), bm, cr, params = list(integration_method = "DIABLO", n_components = 2L, outcome_levels = c("ctrl", "treat"), cv_folds = 3L)) generateReport(obj, n_top = 3)library(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1", "Comp2"))) bm <- DataFrame( feature = c("Gene1", "Taxon1", "Gene2"), omics_layer = c("host", "microbiome", "host"), loading_score = c(0.9, 0.8, 0.7), rank = c(1L, 2L, 3L), component = c(1L, 1L, 1L) ) cr <- list( host_only = list(mean_auc = 0.82, sd_auc = 0.05, fold_auc = c(0.78, 0.84, 0.83)), microbiome_only = list(mean_auc = 0.75, sd_auc = 0.06, fold_auc = c(0.70, 0.79, 0.76)), joint = list(mean_auc = 0.94, sd_auc = 0.03, fold_auc = c(0.92, 0.95, 0.94)), cv_folds = 3L, outcome_levels = c("ctrl", "treat") ) obj <- MOBResult(scores, list(), bm, cr, params = list(integration_method = "DIABLO", n_components = 2L, outcome_levels = c("ctrl", "treat"), cv_folds = 3L)) generateReport(obj, n_top = 3)
Returns the matrix of integrated sample scores (samples x
latent components) from a MOBResult object.
integrationScores(x, ...) ## S4 method for signature 'MOBResult' integrationScores(x, ...)integrationScores(x, ...) ## S4 method for signature 'MOBResult' integrationScores(x, ...)
x |
A |
... |
Additional arguments (not used). |
A matrix with rows = samples and columns = latent
components.
library(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S",1:10), c("Comp1","Comp2"))) obj <- MOBResult(scores, list(), DataFrame(), list()) integrationScores(obj)library(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S",1:10), c("Comp1","Comp2"))) obj <- MOBResult(scores, list(), DataFrame(), list()) integrationScores(obj)
Performs joint dimensionality reduction across host transcriptomics
and gut microbiome data using DIABLO (Data Integration Analysis for
Biomarker discovery using Latent cOmponents), a sparse multi-block
PLS-DA implemented in mixOmics. DIABLO simultaneously identifies
correlated features across data blocks while discriminating between
outcome groups, enforcing sparsity so only the most informative
features contribute to each latent component.
jointDimReduction( mae, outcome, n_components = 2L, n_features_host = 50L, n_features_mb = 20L, design_off_diag = 0.1, host_assay = "voom", mb_assay = "CLR", min_variance = 0 )jointDimReduction( mae, outcome, n_components = 2L, n_features_host = 50L, n_features_mb = 20L, design_off_diag = 0.1, host_assay = "voom", mb_assay = "CLR", min_variance = 0 )
mae |
A |
outcome |
A |
n_components |
An |
n_features_host |
An |
n_features_mb |
An |
design_off_diag |
A |
host_assay |
A |
mb_assay |
A |
min_variance |
A |
The DIABLO model is fitted as:
subject to and
, where
are the sample scores (variates) and are the
sparse feature weights (loadings).
The number of features retained per component is controlled by
n_features_host and n_features_mb. By default, a
design matrix connecting all blocks with moderate correlation
(design_off_diag = 0.1) is used, which prioritizes outcome
discrimination over cross-block correlation.
A named list with elements:
scoresA matrix (samples x components) of
integrated sample scores (from the host block variate).
host_loadingsA matrix (genes x components)
of sparse host feature loadings.
mb_loadingsA matrix (taxa x components)
of sparse microbiome feature loadings.
explained_varianceA named numeric vector of
explained variance per component.
diablo_objectThe full DIABLO result object from
mixOmics::block.splsda.
outcomeThe outcome factor used.
biomarkerDiscovery, plotIntegration,
MultiOmicsBridgeAnalysis
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome = outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) dim(dr_res$scores) head(dr_res$explained_variance)set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) dr_res <- jointDimReduction(mae, outcome = outcome, n_components = 2, n_features_host = 30, n_features_mb = 15) dim(dr_res$scores) head(dr_res$explained_variance)
Imports a bulk RNA-seq count matrix and applies TMM normalization
followed by limma-voom precision weighting. The result is a
SummarizedExperiment with both the raw counts and
voom-transformed log2-CPM values stored as named assays.
loadHostData(counts, col_data = NULL, min_count = 1, assay_name = "voom")loadHostData(counts, col_data = NULL, min_count = 1, assay_name = "voom")
counts |
A |
col_data |
An optional |
min_count |
A |
assay_name |
A |
The normalization pipeline:
Construct a DGEList from raw counts.
Apply TMM (trimmed mean of M-values) normalization via
edgeR::normLibSizes to remove compositional bias
between libraries.
Apply limma::voom to compute log2-CPM values with
precision weights that model the mean-variance trend. These
weights are used downstream by diagnosticClassifier
and jointDimReduction.
Genes with very low expression are not filtered here; gene filtering
is performed within jointDimReduction and
diagnosticClassifier based on expression in the matched
multi-omics dataset. Users may pre-filter genes if desired.
A SummarizedExperiment with two assays:
"counts"Raw integer count matrix.
"voom"Voom-transformed log2-CPM matrix with TMM library size normalization applied.
Sample metadata (if provided) is stored in colData.
loadMicrobiomeData, matchSamples,
MultiOmicsBridgeAnalysis
set.seed(42) n_genes <- 200 n_samples <- 20 counts <- matrix(rpois(n_genes * n_samples, lambda = 150), nrow = n_genes, ncol = n_samples) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Sample", seq_len(n_samples)) col_data <- data.frame( condition = rep(c("ctrl", "treat"), each = 10), row.names = colnames(counts) ) host_se <- loadHostData(counts, col_data = col_data) host_se SummarizedExperiment::assayNames(host_se)set.seed(42) n_genes <- 200 n_samples <- 20 counts <- matrix(rpois(n_genes * n_samples, lambda = 150), nrow = n_genes, ncol = n_samples) rownames(counts) <- paste0("Gene", seq_len(n_genes)) colnames(counts) <- paste0("Sample", seq_len(n_samples)) col_data <- data.frame( condition = rep(c("ctrl", "treat"), each = 10), row.names = colnames(counts) ) host_se <- loadHostData(counts, col_data = col_data) host_se SummarizedExperiment::assayNames(host_se)
Imports a microbiome taxa count table and applies either centered
log-ratio (CLR) or total sum scaling (TSS) normalization. The result
is a SummarizedExperiment with both raw counts and normalized
values stored as named assays.
loadMicrobiomeData( taxa_table, col_data = NULL, normalization = c("CLR", "TSS"), pseudocount = 0.5, min_prevalence = 0.1 )loadMicrobiomeData( taxa_table, col_data = NULL, normalization = c("CLR", "TSS"), pseudocount = 0.5, min_prevalence = 0.1 )
taxa_table |
A |
col_data |
An optional |
normalization |
A |
pseudocount |
A positive |
min_prevalence |
A |
Microbiome data is compositional: only relative abundances are observed, not absolute counts. Treating compositional data with standard correlation or distance measures leads to spurious results (the Aitchison problem). MultiOmicsBridge applies one of two microbiome-appropriate normalizations:
The centered log-ratio transformation:
where is the pseudocount and the sum is over all
taxa. CLR maps compositional data to real space and removes
the unit-sum constraint, enabling Euclidean geometry.
Total sum scaling divides each sample by its library size, producing relative abundances (proportions). Simpler but retains the compositional constraint.
Zero counts are handled by adding a small pseudocount before log-transformation; the default pseudocount of 0.5 is a conservative choice appropriate for sparse 16S data.
A SummarizedExperiment with two assays:
"counts"Raw integer count matrix (taxa x samples).
"CLR" or "TSS"
Normalized values.
loadHostData, matchSamples,
MultiOmicsBridgeAnalysis
set.seed(42) n_taxa <- 80 n_samples <- 20 taxa_table <- matrix(rpois(n_taxa * n_samples, lambda = 30), nrow = n_taxa, ncol = n_samples) rownames(taxa_table) <- paste0("Taxon", seq_len(n_taxa)) colnames(taxa_table) <- paste0("Sample", seq_len(n_samples)) mb_se <- loadMicrobiomeData(taxa_table, normalization = "CLR") mb_se SummarizedExperiment::assayNames(mb_se)set.seed(42) n_taxa <- 80 n_samples <- 20 taxa_table <- matrix(rpois(n_taxa * n_samples, lambda = 30), nrow = n_taxa, ncol = n_samples) rownames(taxa_table) <- paste0("Taxon", seq_len(n_taxa)) colnames(taxa_table) <- paste0("Sample", seq_len(n_samples)) mb_se <- loadMicrobiomeData(taxa_table, normalization = "CLR") mb_se SummarizedExperiment::assayNames(mb_se)
Identifies samples present in both host and microbiome
SummarizedExperiment objects, subsets both to the common
samples, and assembles a MultiAssayExperiment (MAE) that
serves as the primary input for downstream analysis functions.
matchSamples( host_se, mb_se, sample_col_host = NULL, sample_col_mb = NULL, min_paired = 5L )matchSamples( host_se, mb_se, sample_col_host = NULL, sample_col_mb = NULL, min_paired = 5L )
host_se |
A |
mb_se |
A |
sample_col_host |
A |
sample_col_mb |
A |
min_paired |
An |
In paired multi-omics studies, not all samples necessarily have both
data types due to sequencing failures, QC exclusions, or study design.
matchSamples transparently reports how many samples are retained
and warns if fewer than min_paired paired samples are found.
The output MultiAssayExperiment stores the host and microbiome
SummarizedExperiment objects under the names "host" and
"microbiome" respectively, with a unified colData drawn
from the host sample metadata.
A MultiAssayExperiment with two experiments:
"host"Subset of host_se for paired samples.
"microbiome"Subset of mb_se for paired
samples.
The colData of the MAE is taken from host_se for the
paired samples.
loadHostData, loadMicrobiomeData,
MultiOmicsBridgeAnalysis
set.seed(42) # Host data: 200 genes, 20 samples host_counts <- matrix(rpois(200 * 20, 150), nrow = 200, ncol = 20, dimnames = list(paste0("Gene", 1:200), paste0("Sample", 1:20))) host_se <- loadHostData(host_counts) # Microbiome data: 50 taxa, 18 samples (2 missing) mb_counts <- matrix(rpois(50 * 18, 30), nrow = 50, ncol = 18, dimnames = list(paste0("Taxon", 1:50), paste0("Sample", 1:18))) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se, min_paired = 5) maeset.seed(42) # Host data: 200 genes, 20 samples host_counts <- matrix(rpois(200 * 20, 150), nrow = 200, ncol = 20, dimnames = list(paste0("Gene", 1:200), paste0("Sample", 1:20))) host_se <- loadHostData(host_counts) # Microbiome data: 50 taxa, 18 samples (2 missing) mb_counts <- matrix(rpois(50 * 18, 30), nrow = 50, ncol = 18, dimnames = list(paste0("Taxon", 1:50), paste0("Sample", 1:18))) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se, min_paired = 5) mae
Create a new MOBResult object.
MOBResult( integratedScores, featureLoadings, biomarkerTable, classifierResults, params = list() )MOBResult( integratedScores, featureLoadings, biomarkerTable, classifierResults, params = list() )
integratedScores |
A |
featureLoadings |
A named |
biomarkerTable |
A |
classifierResults |
A named |
params |
A |
A MOBResult object.
library(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1","Comp2"))) loadings <- list( host = matrix(rnorm(10), nrow = 5, ncol = 2, dimnames = list(paste0("G", 1:5), c("Comp1","Comp2"))), microbiome = matrix(rnorm(6), nrow = 3, ncol = 2, dimnames = list(paste0("T", 1:3), c("Comp1","Comp2"))) ) bm <- DataFrame( feature = c("G1", "T1"), omics_layer = c("host", "microbiome"), loading_score = c(0.8, 0.6), rank = c(1L, 2L), component = c(1L, 1L) ) obj <- MOBResult( integratedScores = scores, featureLoadings = loadings, biomarkerTable = bm, classifierResults = list(), params = list(integration_method = "DIABLO") ) objlibrary(S4Vectors) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1","Comp2"))) loadings <- list( host = matrix(rnorm(10), nrow = 5, ncol = 2, dimnames = list(paste0("G", 1:5), c("Comp1","Comp2"))), microbiome = matrix(rnorm(6), nrow = 3, ncol = 2, dimnames = list(paste0("T", 1:3), c("Comp1","Comp2"))) ) bm <- DataFrame( feature = c("G1", "T1"), omics_layer = c("host", "microbiome"), loading_score = c(0.8, 0.6), rank = c(1L, 2L), component = c(1L, 1L) ) obj <- MOBResult( integratedScores = scores, featureLoadings = loadings, biomarkerTable = bm, classifierResults = list(), params = list(integration_method = "DIABLO") ) obj
An S4 class storing the output of MultiOmicsBridgeAnalysis.
Slots hold integrated sample scores from joint dimensionality reduction,
per-layer feature loadings, a ranked multi-omics biomarker table,
diagnostic classifier performance metrics, and analysis parameters.
integratedScoresA matrix of integrated sample scores with
rows = samples and columns = latent components. Produced by DIABLO.
featureLoadingsA named list with two elements:
host (genes x components) and microbiome (taxa x
components) containing the sparse feature loading matrices.
biomarkerTableA DataFrame with one row per selected
biomarker containing columns feature, omics_layer,
loading_score, rank, and component.
classifierResultsA named list with elements
host_only, microbiome_only, and joint, each
containing cross-validated AUC-ROC and fold-level performance metrics.
paramsA list of analysis parameters including
integration_method, n_components, n_biomarkers,
cv_folds, and outcome_levels.
A one-call wrapper that executes the complete MultiOmicsBridge analysis
pipeline: joint dimensionality reduction, multi-omics biomarker
discovery, and integrated diagnostic classification. The function
accepts a MultiAssayExperiment (from matchSamples)
and returns a MOBResult S4 object containing all results.
MultiOmicsBridgeAnalysis( mae, outcome, n_components = 2L, n_features_host = 50L, n_features_mb = 20L, n_biomarkers = 50L, cv_folds = 5L, host_assay = "voom", mb_assay = "CLR", design_off_diag = 0.1, seed = 42L, BPPARAM = SerialParam() )MultiOmicsBridgeAnalysis( mae, outcome, n_components = 2L, n_features_host = 50L, n_features_mb = 20L, n_biomarkers = 50L, cv_folds = 5L, host_assay = "voom", mb_assay = "CLR", design_off_diag = 0.1, seed = 42L, BPPARAM = SerialParam() )
mae |
A |
outcome |
A |
n_components |
An |
n_features_host |
An |
n_features_mb |
An |
n_biomarkers |
An |
cv_folds |
An |
host_assay |
A |
mb_assay |
A |
design_off_diag |
A |
seed |
An |
BPPARAM |
A |
Internally, MultiOmicsBridgeAnalysis calls:
jointDimReduction — DIABLO sparse multi-block
PLS-DA.
biomarkerDiscovery — loading-based ranking and
cross-omics correlation annotation.
diagnosticClassifier — host-only, microbiome-only,
and joint Random Forest classifiers with cross-validation.
Each step can also be called independently for fine-grained control.
A MOBResult object with:
integrationScores(result)Matrix of DIABLO sample scores (samples x components).
featureLoadings(result)Named list of host and microbiome loading matrices.
biomarkers(result)Ranked multi-omics biomarker
DataFrame.
performance(result)Classifier performance list with AUC-ROC for host-only, microbiome-only, and joint models.
matchSamples, jointDimReduction,
biomarkerDiscovery, diagnosticClassifier,
plotIntegration, plotClassifierComparison
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) result head(as.data.frame(biomarkers(result)))set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) result head(as.data.frame(biomarkers(result)))
Returns the cross-validated classifier performance metrics
from a MOBResult object.
performance(x, ...) ## S4 method for signature 'MOBResult' performance(x, ...)performance(x, ...) ## S4 method for signature 'MOBResult' performance(x, ...)
x |
A |
... |
Additional arguments (not used). |
A named list with elements host_only,
microbiome_only, and joint, each containing
mean_auc, sd_auc, and fold_auc.
library(S4Vectors) cr <- list(host_only = list(mean_auc = 0.85), microbiome_only = list(mean_auc = 0.78), joint = list(mean_auc = 0.92)) obj <- MOBResult(matrix(rnorm(20), 10, 2), list(), DataFrame(), cr) performance(obj)library(S4Vectors) cr <- list(host_only = list(mean_auc = 0.85), microbiome_only = list(mean_auc = 0.78), joint = list(mean_auc = 0.92)) obj <- MOBResult(matrix(rnorm(20), 10, 2), list(), DataFrame(), cr) performance(obj)
Produces a clustered heatmap of Spearman correlations between the top selected host genes and microbial taxa. Rows represent host genes, columns represent microbial taxa, and cell colour encodes the Spearman correlation value. Strong positive or negative correlations between a host gene and a microbial taxon suggest a potential functional host-microbe interaction.
plotBiomarkerNetwork( result, mae, n_host = 20L, n_mb = 15L, host_assay = "voom", mb_assay = "CLR", cor_thresh = 0, low_colour = "#F44336", mid_colour = "white", high_colour = "#2196F3" )plotBiomarkerNetwork( result, mae, n_host = 20L, n_mb = 15L, host_assay = "voom", mb_assay = "CLR", cor_thresh = 0, low_colour = "#F44336", mid_colour = "white", high_colour = "#2196F3" )
result |
A |
mae |
A |
n_host |
An |
n_mb |
An |
host_assay |
A |
mb_assay |
A |
cor_thresh |
A |
low_colour |
A |
mid_colour |
A |
high_colour |
A |
A ggplot2 object.
MultiOmicsBridgeAnalysis, plotIntegration
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) plotBiomarkerNetwork(result, mae, n_host = 10, n_mb = 8)set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) plotBiomarkerNetwork(result, mae, n_host = 10, n_mb = 8)
Produces overlaid ROC (Receiver Operating Characteristic) curves
comparing the three diagnostic classifier configurations — host
transcriptomics only, gut microbiome only, and the joint multi-omics
model — on the same axes. The AUC-ROC values are annotated on the
plot, making the multi-omics advantage immediately visible. A
bar chart of mean cross-validated AUC values is also available via
type = "bar".
plotClassifierComparison( result, type = c("roc", "bar"), colours = c(host_only = "#4CAF50", microbiome_only = "#FF9800", joint = "#2196F3"), show_diagonal = TRUE, linewidth = 0.9 )plotClassifierComparison( result, type = c("roc", "bar"), colours = c(host_only = "#4CAF50", microbiome_only = "#FF9800", joint = "#2196F3"), show_diagonal = TRUE, linewidth = 0.9 )
result |
A |
type |
A |
colours |
A named |
show_diagonal |
Logical. If |
linewidth |
A |
A ggplot2 object.
MultiOmicsBridgeAnalysis,
diagnosticClassifier, performance
set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) plotClassifierComparison(result, type = "bar")set.seed(42) host_counts <- matrix(rpois(500 * 20, 100), nrow = 500, ncol = 20, dimnames = list(paste0("Gene", 1:500), paste0("S", 1:20))) host_counts[1:20, 11:20] <- host_counts[1:20, 11:20] * 5L mb_counts <- matrix(rpois(60 * 20, 40), nrow = 60, ncol = 20, dimnames = list(paste0("Taxon", 1:60), paste0("S", 1:20))) host_se <- loadHostData(host_counts) mb_se <- loadMicrobiomeData(mb_counts) mae <- matchSamples(host_se, mb_se) outcome <- rep(c("ctrl", "treat"), each = 10) result <- MultiOmicsBridgeAnalysis(mae, outcome, n_components = 2, n_features_host = 20, n_features_mb = 10, n_biomarkers = 15, cv_folds = 3) plotClassifierComparison(result, type = "bar")
Produces a scatter plot of integrated sample scores in the space defined by two DIABLO latent components, with samples coloured by outcome group and optional feature loading vectors overlaid as arrows. This biplot makes it immediately clear how well the multi-omics integration separates the outcome groups and which features drive that separation.
plotIntegration( result, comp = c(1L, 2L), outcome = NULL, show_loadings = TRUE, n_loading_arrows = 5L, point_size = 2.5, point_alpha = 0.8, colours = NULL )plotIntegration( result, comp = c(1L, 2L), outcome = NULL, show_loadings = TRUE, n_loading_arrows = 5L, point_size = 2.5, point_alpha = 0.8, colours = NULL )
result |
A |
comp |
A length-2 |
outcome |
A |
show_loadings |
Logical. If |
n_loading_arrows |
An |
point_size |
A |
point_alpha |
A |
colours |
A named |
A ggplot2 object.
MultiOmicsBridgeAnalysis, MOBResult,
plotClassifierComparison
library(S4Vectors) scores <- matrix(rnorm(40), nrow = 20, ncol = 2, dimnames = list(paste0("S", 1:20), c("Comp1","Comp2"))) fl <- list( host = matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("G", 1:10), c("Comp1","Comp2"))), microbiome = matrix(rnorm(10), nrow = 5, ncol = 2, dimnames = list(paste0("T", 1:5), c("Comp1","Comp2"))) ) bm <- DataFrame(feature = c("G1","T1"), omics_layer = c("host","microbiome"), loading_score = c(0.8,0.6), rank = c(1L,2L), component = c(1L,1L)) obj <- MOBResult(scores, fl, bm, list(), params = list(outcome_levels = c("ctrl","treat"))) outcome <- rep(c("ctrl","treat"), each = 10) plotIntegration(obj, outcome = outcome)library(S4Vectors) scores <- matrix(rnorm(40), nrow = 20, ncol = 2, dimnames = list(paste0("S", 1:20), c("Comp1","Comp2"))) fl <- list( host = matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("G", 1:10), c("Comp1","Comp2"))), microbiome = matrix(rnorm(10), nrow = 5, ncol = 2, dimnames = list(paste0("T", 1:5), c("Comp1","Comp2"))) ) bm <- DataFrame(feature = c("G1","T1"), omics_layer = c("host","microbiome"), loading_score = c(0.8,0.6), rank = c(1L,2L), component = c(1L,1L)) obj <- MOBResult(scores, fl, bm, list(), params = list(outcome_levels = c("ctrl","treat"))) outcome <- rep(c("ctrl","treat"), each = 10) plotIntegration(obj, outcome = outcome)
Produces a Sankey-style flow diagram showing the pipeline from data
source (host or microbiome) through selected top biomarkers to
predicted outcome classes. The width of each connection is proportional
to the feature's loading score, making it easy to see which features
contribute most to separating the outcome groups. The diagram uses
base ggplot2 geometry (no additional Sankey packages required).
plotSankey(result, n_features = 10L, colours = NULL, node_width = 0.15)plotSankey(result, n_features = 10L, colours = NULL, node_width = 0.15)
result |
A |
n_features |
An |
colours |
A named |
node_width |
A |
A ggplot2 object.
MultiOmicsBridgeAnalysis,
plotIntegration, plotBiomarkerNetwork
library(S4Vectors) set.seed(42) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1","Comp2"))) fl <- list( host = matrix(abs(rnorm(20)), nrow = 10, ncol = 2, dimnames = list(paste0("Gene", 1:10), c("Comp1","Comp2"))), microbiome = matrix(abs(rnorm(10)), nrow = 5, ncol = 2, dimnames = list(paste0("Taxon", 1:5), c("Comp1","Comp2"))) ) bm <- DataFrame( feature = c(paste0("Gene", 1:5), paste0("Taxon", 1:3)), omics_layer = c(rep("host", 5), rep("microbiome", 3)), loading_score = c(0.9, 0.7, 0.5, 0.4, 0.3, 0.8, 0.6, 0.2), rank = 1:8, component = rep(1L, 8) ) obj <- MOBResult(scores, fl, bm, list(), params = list(outcome_levels = c("ctrl","treat"))) plotSankey(obj, n_features = 5)library(S4Vectors) set.seed(42) scores <- matrix(rnorm(20), nrow = 10, ncol = 2, dimnames = list(paste0("S", 1:10), c("Comp1","Comp2"))) fl <- list( host = matrix(abs(rnorm(20)), nrow = 10, ncol = 2, dimnames = list(paste0("Gene", 1:10), c("Comp1","Comp2"))), microbiome = matrix(abs(rnorm(10)), nrow = 5, ncol = 2, dimnames = list(paste0("Taxon", 1:5), c("Comp1","Comp2"))) ) bm <- DataFrame( feature = c(paste0("Gene", 1:5), paste0("Taxon", 1:3)), omics_layer = c(rep("host", 5), rep("microbiome", 3)), loading_score = c(0.9, 0.7, 0.5, 0.4, 0.3, 0.8, 0.6, 0.2), rank = 1:8, component = rep(1L, 8) ) obj <- MOBResult(scores, fl, bm, list(), params = list(outcome_levels = c("ctrl","treat"))) plotSankey(obj, n_features = 5)
Prints a compact summary of a MOBResult object,
including the top biomarkers, integration method, and classifier
AUC-ROC values.
## S4 method for signature 'MOBResult' show(object)## S4 method for signature 'MOBResult' show(object)
object |
A |
Invisibly returns object.