IntegratedLearner

This vignette is a practical tutorial for binary, multiclass, continuous, and survival outcome workflows in IntegratedLearner.

The goal is to show a complete end-to-end pattern you can adapt to your own multi-omics study:

Build correctly formatted training/validation inputs.
Fit per-layer, stacked, and concatenated learners.
Interpret model outputs (AUC, accuracy, balanced accuracy, R2, survival discrimination curves, layer weights, and feature signals).

IntegratedLearner supports two integration paradigms:

Early fusion: concatenated features across layers.
Late fusion: layer-specific models combined by a meta-learner.

Optional feature selection workflow used in this vignette:

Filtering first (filter_method, filter_pct) on training features.
Screening second (run_screening = TRUE, screen_pct) in a fold-safe manner:
- selected on fold-training only,
- applied to fold-validation,
- repeated for each fold,
- repeated once on full training data for final model fit.

Load Packages

# Main package
library(IntegratedLearner)

# Tutorial dependencies
library(dplyr)
library(ggplot2)
library(SuperLearner)
library(caret)
library(cowplot)
library(bayesplot)
library(S4Vectors)
library(SummarizedExperiment)
library(MultiAssayExperiment)
library(survival)
if (use_sl_bart) {
  library(bartMachine)
}

Input Data Contract

For the PCL_* interface used in this tutorial, each dataset is a list with:

feature_table: data frame with features in rows and samples in columns.
sample_metadata: data frame with samples in rows. Must include:
- one subject identifier column (default name: subjectID).
- one outcome column (default name: Y).
feature_metadata: data frame with features in rows. Must include:
- featureID: unique feature identifier.
- featureType: layer label (for example, species, metabolites).

Required alignments:

rownames(feature_table) == rownames(feature_metadata)
colnames(feature_table) == rownames(sample_metadata)

If you provide a validation set, it must use the same feature set and ordering as training.

For survival workflows, include time and event columns in sample_metadata (with event coded as 0/1). You can also provide Y as a Surv(time, event) object.

You can keep your own column names and map them in the wrapper:

fit <- IntegratedLearner(
  PCL_train = pcl_train,
  outcome_col = "disease_status",
  subject_id_col = "participant_id",
  family = stats::binomial()
)

Automatic coercion in the wrapper:

family = gaussian(): outcome is coerced to numeric (errors if conversion fails).
binary family = binomial(): two classes are mapped internally to {0,1}.
multiclass family = binomial(): class labels are retained.

Alternative Input Mode: MAE (Complete Binary Example)

IntegratedLearner accepts MultiAssayExperiment inputs through MAE_train/MAE_valid. This is often the cleanest path when each omics layer is already represented as a SummarizedExperiment/TreeSummarizedExperiment.

library(curatedMetagenomicData)


# 1) Download two aligned layers from curatedMetagenomicData
asnicar_tax <- curatedMetagenomicData(
  "DavidLA_2015.relative_abundance",
  dryrun = FALSE
)[[1]]

asnicar_path <- curatedMetagenomicData(
  "DavidLA_2015.pathway_abundance",
  dryrun = FALSE
)[[1]]

tax_tse  <- as(asnicar_tax,  "TreeSummarizedExperiment")
path_tse <- as(asnicar_path, "TreeSummarizedExperiment")

# 2) Keep common samples in both layers
common_samples <- intersect(colnames(tax_tse), colnames(path_tse))
common_samples <- as.character(common_samples)

tax_tse  <- tax_tse[, common_samples]
path_tse <- path_tse[, common_samples]

# 3) Build binary outcome and subject IDs inside each experiment
Yvec <- ifelse(as.character(colData(tax_tse)$disease) == "healthy", 0L, 1L)

SummarizedExperiment::colData(tax_tse)$Y <- Yvec
SummarizedExperiment::colData(path_tse)$Y <- Yvec

SummarizedExperiment::colData(tax_tse)$subjectID <- common_samples
SummarizedExperiment::colData(path_tse)$subjectID <- common_samples

# 4) Build top-level MAE colData
cd <- S4Vectors::DataFrame(
  Y = as.integer(Yvec),
  subjectID = common_samples,
  row.names = common_samples
)

# 5) Build explicit sampleMap
smap <- S4Vectors::DataFrame(
  assay   = c(rep("taxonomy", length(common_samples)),
              rep("pathway",  length(common_samples))),
  primary = c(common_samples, common_samples),
  colname = c(common_samples, common_samples)
)

smap$assay   <- as.character(smap$assay)
smap$primary <- as.character(smap$primary)
smap$colname <- as.character(smap$colname)

# 6) Build MAE container
mae <- MultiAssayExperiment(
  experiments = ExperimentList(
    taxonomy = tax_tse,
    pathway  = path_tse
  ),
  colData = cd,
  sampleMap = smap
)

# 7) Stratified train/validation split
y <- MultiAssayExperiment::colData(mae)$Y
names(y) <- rownames(MultiAssayExperiment::colData(mae))

set.seed(1)

i0 <- which(y == 0)
i1 <- which(y == 1)

train0 <- sample(i0, floor(0.7 * length(i0)))
train1 <- sample(i1, floor(0.7 * length(i1)))

train_ids <- names(y)[sort(c(train0, train1))]
valid_ids <- setdiff(names(y), train_ids)

mae_train <- mae[, train_ids]
mae_valid <- mae[, valid_ids]

# 8) Fit IntegratedLearner in MAE mode
fit_mae_bin <- IntegratedLearner(
  MAE_train = mae_train,
  MAE_valid = mae_valid,
  experiment = c("taxonomy", "pathway"),
  assay.type = c("relative_abundance", "pathway_abundance"),
  folds = 2,
  base_learner = "SL.randomForest",
  meta_learner = "SL.nnls.auc",
  filter_method = "prevalence",
  filter_pct = 40,
  run_screening = TRUE,
  screen_pct = 30,
  family = binomial(),
  verbose = TRUE
)

# 10) Results
fit_mae_bin$AUC.train
fit_mae_bin$AUC.test
fit_mae_bin$weights

The returned object is the same style as PCL mode, so downstream interpretation (AUC.train, R2.train, weights, plot.learner) is unchanged.

Parameter Reference (Conbin, Multiclass, and Survival)

Common Wrapper Parameters (`IntegratedLearner`)

Parameter	Default	Applies to	Description
`MAE_train`, `MAE_valid`	`NULL`	all	MAE-mode inputs (training and optional validation).
`PCL_train`, `PCL_valid`	`NULL`	all	PCL-mode inputs (training and optional validation).
`experiment`	`NULL`	MAE mode	Selected MAE experiment names/indices; defaults to all experiments.
`assay.type`	`NULL`	MAE mode	Assay names per selected experiment.
`outcome_col`	`"Y"`	all	Outcome column name in PCL `sample_metadata` / MAE `colData`.
`subject_id_col`	`"subjectID"`	all	Subject identifier column name in PCL `sample_metadata` / MAE `colData`.
`na.rm`	`FALSE`	all	Drop features with missing values after extraction/prep.
`folds`	`5`	all	Outer CV folds.
`seed`	`1234`	all	Reproducibility seed.
`base_learner`	`"SL.BART"`	all	Base learner. Use `SL.` IDs for continuous/binary, native multiclass IDs (for example `randomforest`, `xgboost`, `mbart`) for multiclass, and explicitly set a supported `surv.` ID for survival runs.
`filter_method`	`NULL`	all	Optional feature filtering method: `"prevalence"` or `"variance"`.
`filter_pct`	`NULL`	all	Optional retention percentage in `(0,100]` for filtering.
`run_screening`	`FALSE`	all	Enable supervised screening.
`screen_pct`	`NULL`	all	Retention percentage in `(0,100]` for screening. Required when screening is enabled.
`prevalence_pct`	`NULL`	all	Deprecated alias for prevalence filtering (`filter_method = "prevalence"`).
`drop_poor_performing_layers`	`FALSE`	continuous, binary, survival	If `TRUE`, removes layers with poor single-layer performance from early and late fusion only (AUC < 0.5 for binary, R² < 0.5 for continuous, C-index < 0.5 for survival). Single-layer results are still retained.
`verbose`	`FALSE`	all	Print progress.
`family`	`gaussian()`	all	Non-survival: `gaussian()`/`binomial()`. Multiclass is auto-detected when `family = binomial()` and outcome has more than two classes. Survival is auto-detected from metadata or family.
`...`	—	all	Passed to the selected backend (`IL_conbin` or `ILsurv`).

Conbin-Specific Parameters (Continuous/Binary Path)

Parameter	Default	Description
`base_screener`	`"All"`	Deprecated compatibility parameter. Prefer `run_screening` + `screen_pct`.
`meta_learner`	`"SL.nnls.auc"`	Stacked meta learner for late fusion.
`run_stacked`	`TRUE`	Enables late-fusion stacked model.
`run_concat`	`TRUE`	Enables early-fusion concatenated model.
`print_learner`	`TRUE`	Prints fit summary.
`refit.stack`	`FALSE`	Refit stacked learner on full data for final predictions.

Multiclass-Specific Parameters (Native Multiclass Path)

Parameter	Default	Description
`base_learner`	`"glmnet"`	Native multiclass learner per layer and for concatenated fit. Supported: `glmnet`, `randomforest`, `ranger`, `xgboost`, `mbart`, `multinom`.
`meta_learner`	`"glmnet"`	Native multiclass learner used for stacked fusion.
`base_screener`	`"All"`	Deprecated compatibility parameter. Prefer `run_screening` + `screen_pct`.
`run_stacked`	`TRUE`	Enables late-fusion stacked multiclass model.
`run_concat`	`TRUE`	Enables early-fusion concatenated multiclass model.
`folds`	`5`	Subject-level CV folds for OOF multiclass probabilities.
`run_screening`, `screen_pct`	`FALSE`, `NULL`	Fold-safe multiclass screening (glmnet-based) after optional filtering.

Survival-Specific Parameters (via `...`)

Parameter	Default	Description
`do_early_fusion`	`TRUE`	Train an early-fusion survival model on all features.
`weight_method`	`"IBS"`	Late-fusion weighting objective (`"IBS"` or `"COX"`).
`t_vec`, `t_vec_probs`	`NULL`, quantiles	Time grid used in COX-style weighting summaries.
`layer_score`	`"sum"`	Aggregation of cumulative hazard increments (`sum`, `mean`, `l2`).
`weight_lambda`	`0.02`	Regularization strength for COX weighting optimizer.
`weight_penalty`	`"l2_to_uniform"` or `"entropy"`	Penalty used while learning survival late-fusion weights.
`weight_cap`	`1.0`	Optional cap on individual layer weights.
`optim_maxit_cox`	`4000`	Max iterations for COX weighting optimization.
`optim_maxit_ibs`	`300`	Max iterations for IBS weighting optimization.
`ibs_shrink_to_uniform`	`0`	Shrink IBS weights toward uniform blend.

Supported Models and Fusion Modules

Supported Models

Path	Supported base models
Continuous/Binary (`IL_conbin`)	Any `SuperLearner`-compatible `SL.*` learner available in your R session. Package wrappers include: `SL.BART`, `SL.LASSO`, `SL.enet`, `SL.glmnet2`, `SL.horseshoe`, `SL.mxBART` (plus standard SuperLearner learners such as `SL.glm`, `SL.randomForest`, etc.).
Multiclass (`IL_multiclass`)	Native multiclass learner IDs: `glmnet`, `randomforest`, `ranger`, `xgboost`, `mbart`, `multinom`.
Survival (`ILsurv`)	Built-in survival learner IDs: `surv.coxph`, `surv.glmnet`, `surv.ranger`, `surv.ranger.extratrees`, `surv.ranger.maxstat`, `surv.ranger.C`, `surv.rfsrc`, `surv.coxboost`, `surv.gbm`, `surv.xgboost.cox`, `surv.xgboost.aft`, `surv.mboost`, `surv.bart`.

Supported Fusion Outputs

Path	Single-layer	Early fusion	Late fusion
Continuous/Binary	Yes	`run_concat = TRUE`	`run_stacked = TRUE` with `meta_learner`
Multiclass	Yes	`run_concat = TRUE`	`run_stacked = TRUE` with native multiclass `meta_learner`
Survival	Yes (`train_out$single`)	`do_early_fusion = TRUE`	Weighted layer blending (`weight_method = "IBS"` or `"COX"`)

Output Reference: What You Get and How to Access It

This section summarizes the outputs produced by each integration method and where to find weights/importance values.

Conbin Outputs (Binary/Continuous)

Method	What it returns	Where to access
Single-layer (per omics layer)	Layer-specific predictions and metrics	`fit$yhat.train[, layer_name]`, `fit$yhat.test[, layer_name]` (if validation), `fit$AUC.train` / `fit$AUC.test`, `fit$accuracy.train` / `fit$accuracy.test`, `fit$balanced_accuracy.train` / `fit$balanced_accuracy.test` (binomial), `fit$R2.train` / `fit$R2.test` (gaussian)
Early fusion (concatenated)	One model on all features concatenated	Enable with `run_concat = TRUE`; outputs in `fit$yhat.train[, "concatenated"]`, `fit$model_fits$model_concat`, `fit$SL_fits$SL_fit_concat`
Late fusion (stacked)	Meta-model over layer-level predictions	Enable with `run_stacked = TRUE`; outputs in `fit$yhat.train[, "stacked"]`, `fit$model_fits$model_stacked`, `fit$SL_fits$SL_fit_stacked`
Layer weights (stacked)	Relative contribution of each layer in late fusion	`fit$weights` (available when `meta_learner = "SL.nnls.auc"` and `run_stacked = TRUE`)
Binary metric table	Per-model AUC, accuracy, and balanced accuracy	`fit$metrics.train` and `fit$metrics.test` (if validation provided)

Multiclass Outputs

Method	What it returns	Where to access
Single-layer (per omics layer)	Layer-wise multiclass probability and class predictions	`fit$prob.train[[layer_name]]`, `fit$class.train[, layer_name]`, plus validation analogs `fit$prob.test[[layer_name]]`, `fit$class.test[, layer_name]`
Early fusion (concatenated)	One multiclass model on concatenated features	Enable with `run_concat = TRUE`; outputs in `fit$prob.train$concatenated`, `fit$class.train[, "concatenated"]`, `fit$model_fits$model_concat`
Late fusion (stacked)	Multiclass meta-model over OOF layer probabilities	Enable with `run_stacked = TRUE`; outputs in `fit$prob.train$stacked`, `fit$class.train[, "stacked"]`, `fit$model_fits$model_stacked`
Multiclass performance metrics	Accuracy, balanced accuracy, one-vs-rest AUC, and log-loss	`fit$metrics.train` and `fit$metrics.test` (if validation provided)
Feature-selection metadata	Filtering/screening settings used in fit	`fit$filter_method`, `fit$filter_pct`, `fit$prevalence_pct`, `fit$screening_used`, `fit$screen_method`, `fit$screen_pct`
Screened feature sets	Features retained by fold-safe screening	`fit$selected_features_by_layer`, `fit$selected_features_concat`

Survival Outputs (Single/Early/Late)

Method	Training outputs	Validation outputs
Single-layer	`fit$train_out$single$metrics`, `fit$train_out$single$train_risk`	`fit$valid_out$single$valid_cindex`, `fit$valid_out$single$valid_auc`, `fit$valid_out$single$valid_risk`
Early fusion	`fit$train_out$early$train_cindex`, `fit$train_out$early$train_auc`, `fit$train_out$early$train_risk`	`fit$valid_out$early$valid_cindex`, `fit$valid_out$early$valid_auc`, `fit$valid_out$early$valid_risk`
Late fusion	`fit$train_out$late$weights`, `fit$train_out$late$train_cindex`, `fit$train_out$late$train_auc`, `fit$train_out$late$train_risk`	`fit$valid_out$late$valid_cindex`, `fit$valid_out$late$valid_auc`, `fit$valid_out$late$valid_risk`
Survival plotting payload	`fit$surv_plot_data$train`	`fit$surv_plot_data$valid`

Importance Outputs (Conbin, Multiclass, and Survival)

Importance type	Where to access	Notes
Conbin signed global feature importance	`fit$feature_importance_signed`	Always returned for non-survival fits; named numeric vector sorted by effect magnitude/sign.
Conbin signed per-layer importance	`fit$feature_importance_signed_by_layer`	List split by `featureType`.
Multiclass signed global feature importance	`fit$feature_importance_global`	Global score aggregated across multiclass contrasts.
Multiclass signed importance by class	`fit$feature_importance_signed_by_class`	List with one signed vector per class.
Multiclass signed importance by layer and class	`fit$feature_importance_signed_by_layer_by_class`	Nested list by layer then class.
Survival early-fusion combined importance	`fit$train_out$early$combined_importance`	Available when `do_early_fusion = TRUE`.
Survival late-fusion combined importance	`fit$train_out$late$combined_importance`	Weighted signed importance; names are prefixed like `layer::feature`.
BART-specific layer importance (optional)	`bartMachine::investigate_var_importance(fit$model_fits$model_layers[[layer]], plot = FALSE)`	Only for BART-based conbin fits (`base_learner = "SL.BART"`).

Quick Access Snippets

# ---- Conbin: weights + top features ----
fit$weights
head(fit$feature_importance_signed, 20)
names(fit$feature_importance_signed_by_layer)
head(fit$feature_importance_signed_by_layer[[1]], 20)

# ---- Multiclass: metrics + class probabilities + importance ----
fit_mc$metrics.train
fit_mc$metrics.test
head(fit_mc$class.train)
head(fit_mc$class.test)
head(fit_mc$prob.train$stacked)
head(fit_mc$feature_importance_global, 20)
head(fit_mc$feature_importance_signed_by_class[[1]], 20)

# ---- Survival: late-fusion weights + top combined features ----
fit_surv$train_out$late$weights
head(fit_surv$train_out$late$combined_importance, 20)

# ---- Survival: inspect all fusion branches ----
fit_surv$train_out$single
fit_surv$train_out$early
fit_surv$train_out$late

Example 1: Binary Outcome (IBD Classification)

This section uses the PRISM dataset (Franzosa et al., 2019) for classifying IBD status. In these fixtures the binary target is in sample_metadata$Y (default outcome_col behavior).

Step 1: Load and Inspect Training and Validation Data

# Training data
load_il_dataset("PRISM", envir = environment())
pcl <- PRISM

feature_table <- pcl$feature_table
sample_metadata <- pcl$sample_metadata
feature_metadata <- pcl$feature_metadata
rm(pcl)

# Quick checks
head(feature_table[1:5, 1:5])
#>                                G35127      G35128      G35152       G36347
#> Granulicella_unclassified -0.05253649 -0.05127158 -0.06133085  0.004887447
#> Actinomyces_graevenitzii   1.04668500 -1.32629194 -1.51654615 -3.247989324
#> Actinomyces_johnsonii     -0.70327678 -0.41575776 -0.29326475 -0.314361595
#> Actinomyces_massiliensis  -0.56808952  0.14722099  0.05660884 -1.077235688
#> Actinomyces_naeslundii    -0.49546119 -0.15921604 -0.03146485 -0.354377267
#>                                 G36348
#> Granulicella_unclassified -0.006164066
#> Actinomyces_graevenitzii  -0.717183019
#> Actinomyces_johnsonii     -0.340485318
#> Actinomyces_massiliensis  -0.159240362
#> Actinomyces_naeslundii    -0.139758576
head(sample_metadata[1:5, ])
#>        Diagnosis dysbiosis_score Y subjectID
#> G35127        CD       0.9341207 1    G35127
#> G35128        CD       0.5962602 1    G35128
#> G35152        CD       0.9505732 1    G35152
#> G36347        CD       0.9966957 1    G36347
#> G36348        CD       0.8475403 1    G36348
head(feature_metadata[1:5, ])
#>                                           featureID featureType
#> Granulicella_unclassified Granulicella_unclassified     species
#> Actinomyces_graevenitzii   Actinomyces_graevenitzii     species
#> Actinomyces_johnsonii         Actinomyces_johnsonii     species
#> Actinomyces_massiliensis   Actinomyces_massiliensis     species
#> Actinomyces_naeslundii       Actinomyces_naeslundii     species

table(feature_metadata$featureType)
#> 
#> metabolites     species 
#>        1500         340
table(sample_metadata$Y)
#> 
#>   0   1 
#>  34 121

all(rownames(feature_table) == rownames(feature_metadata))
#> [1] TRUE
all(colnames(feature_table) == rownames(sample_metadata))
#> [1] TRUE

# Independent validation data
load_il_dataset("NLIBD", envir = environment())
pcl <- NLIBD
feature_table_valid <- pcl$feature_table
sample_metadata_valid <- pcl$sample_metadata
rm(pcl)

# Align validation features to training feature set/order (required by IntegratedLearner)
if (!identical(rownames(feature_table), rownames(feature_table_valid))) {
  missing_in_valid <- setdiff(rownames(feature_table), rownames(feature_table_valid))
  if (length(missing_in_valid) > 0) {
    stop("Validation set is missing training features, e.g.: ", paste(head(missing_in_valid, 5), collapse = ", "))
  }
  feature_table_valid <- feature_table_valid[rownames(feature_table), , drop = FALSE]
}

all(rownames(feature_table) == rownames(feature_table_valid))
#> [1] TRUE
all(colnames(feature_table_valid) == rownames(sample_metadata_valid))
#> [1] TRUE

Step 2: Build PCL Inputs

PCL_train <- list(
  feature_table = feature_table,
  sample_metadata = sample_metadata,
  feature_metadata = feature_metadata
)

PCL_valid <- list(
  feature_table = feature_table_valid,
  sample_metadata = sample_metadata_valid,
  feature_metadata = feature_metadata
)

Step 3: Fit the Model

IntegratedLearner fits one model per layer (base_learner) and then combines layer-level predictions with a meta-learner (meta_learner).

fit <- IntegratedLearner(
  PCL_train = PCL_train,
  PCL_valid = PCL_valid,
  folds = 2,
  base_learner = "SL.randomForest",
  meta_learner = "SL.nnls.auc",
  filter_method = "prevalence",
  filter_pct = 40,
  run_screening = TRUE,
  screen_pct = 30,
  verbose = TRUE,
  family = binomial()
)
#> Feature filter (prevalence ranking, top 40.00% per layer): kept 736/1840 features. Layer breakdown: species=136/340, metabolites=600/1500.
#> Running base model for layer 1...
#> Number of covariates in screen.il.glmnet is: 180
#> CV SL.randomForest_screen.il.glmnet
#> Number of covariates in screen.il.glmnet is: 180
#> CV SL.randomForest_screen.il.glmnet
#> Non-Negative least squares convergence: TRUE
#> full SL.randomForest_screen.il.glmnet
#> Running base model for layer 2...
#> Number of covariates in screen.il.glmnet is: 41
#> CV SL.randomForest_screen.il.glmnet
#> Number of covariates in screen.il.glmnet is: 41
#> CV SL.randomForest_screen.il.glmnet
#> Non-Negative least squares convergence: TRUE
#> full SL.randomForest_screen.il.glmnet
#> Running stacked model...
#> Number of covariates in All is: 2
#> CV SL.nnls.auc_All
#> Number of covariates in All is: 2
#> CV SL.nnls.auc_All
#> Non-Negative least squares convergence: TRUE
#> full SL.nnls.auc_All
#> Running concatenated model...
#> Number of covariates in screen.il.glmnet is: 221
#> CV SL.randomForest_screen.il.glmnet
#> Number of covariates in screen.il.glmnet is: 221
#> CV SL.randomForest_screen.il.glmnet
#> Non-Negative least squares convergence: TRUE
#> full SL.randomForest_screen.il.glmnet
#> Time for model fit : 0.094 minutes 
#> ========================================
#> Model fit for individual layers: SL.randomForest 
#> Model fit for stacked layer: SL.nnls.auc 
#> Model fit for concatenated layer: SL.randomForest 
#> ========================================
#> AUC metric for training data: 
#> Individual layers: 
#> metabolites     species 
#>       0.845       0.961 
#> ======================
#> Stacked model:0.963 
#> ======================
#> Concatenated model:0.966 
#> ======================
#> ========================================
#> AUC metric for test data: 
#> Individual layers: 
#> metabolites     species 
#>       0.742       0.566 
#> ======================
#> Stacked model:0.612 
#> ======================
#> Concatenated model:0.698 
#> ======================
#> ========================================
#> Weights for individual layers predictions in IntegratedLearner: 
#> metabolites     species 
#>       0.222       0.778 
#> ========================================

Step 4: Inspect and Interpret Outputs

Core outputs for binary tasks include:

fit$AUC.train and fit$AUC.test: AUC per layer and fusion model.
fit$accuracy.train and fit$accuracy.test: thresholded accuracy per layer and fusion model.
fit$balanced_accuracy.train and fit$balanced_accuracy.test: balanced accuracy per layer and fusion model.
fit$metrics.train and fit$metrics.test: compact metric tables with AUC, accuracy, and balanced accuracy.
fit$weights: layer contributions in the stacked model (when SL.nnls.auc is used).
fit$yhat.train and fit$yhat.test: predicted probabilities.

fit$AUC.train
#>  metabolites      species      stacked concatenated 
#>        0.845        0.961        0.963        0.966
fit$AUC.test
#>  metabolites      species      stacked concatenated 
#>        0.742        0.566        0.612        0.698
fit$accuracy.train
#>  metabolites      species      stacked concatenated 
#>    0.8322581    0.9096774    0.9161290    0.9032258
fit$balanced_accuracy.train
#>  metabolites      species      stacked concatenated 
#>    0.6387944    0.8575596    0.8616918    0.8534273
fit$metrics.test
#>          model   auc  accuracy balanced_accuracy
#> 1  metabolites 0.742 0.6461538         0.4883721
#> 2      species 0.566 0.6923077         0.5787526
#> 3      stacked 0.612 0.6923077         0.5565539
#> 4 concatenated 0.698 0.6615385         0.5000000
fit$weights
#> metabolites     species 
#>   0.2216494   0.7783506

Plot ROC summaries for train and validation sets:

plot.obj <- IntegratedLearner:::plot.learner(fit)
plot.obj$plot

In this PRISM setting, you can compare which single layer is strongest and whether stacked fusion outperforms both individual layers and simple concatenation.

Example 2: Continuous Outcome (Gestational Age)

This section uses the pregnancy dataset (Ghaemi et al., 2019), where Y is continuous gestational age (default outcome_col behavior).

Step 1: Load and Inspect Data

load_il_dataset("pregnancy", envir = environment())
pcl <- pregnancy

feature_table <- pcl$feature_table
sample_metadata <- pcl$sample_metadata
feature_metadata <- pcl$feature_metadata
rm(pcl)

head(feature_table[1:5, 1:5])
#>        PTLG002_1 PTLG003_1  PTLG004_1 PTLG005_1 PTLG007_1
#> CEP135  28.21785  54.56723  53.776824  15.26909  11.04831
#> MIIP    10.10756  17.11006   4.336841   0.00000  19.88695
#> GNL3    45.25968  58.26670  56.378929  70.23780  64.08018
#> CEP70   79.09550  67.97782  93.675759 128.26033  66.28985
#> TIMP1  172.23675 121.62018 183.014677 247.35921 304.93329
head(sample_metadata[1:5, ])
#>            Y subjectID
#> PTLG002_1 11   PTLG002
#> PTLG003_1 11   PTLG003
#> PTLG004_1 11   PTLG004
#> PTLG005_1 11   PTLG005
#> PTLG007_1 11   PTLG007
head(feature_metadata[1:5, ])
#>        featureID featureType
#> CEP135    CEP135 CellfreeRNA
#> MIIP        MIIP CellfreeRNA
#> GNL3        GNL3 CellfreeRNA
#> CEP70      CEP70 CellfreeRNA
#> TIMP1      TIMP1 CellfreeRNA

table(feature_metadata$featureType)
#> 
#>     CellfreeRNA    ImmuneSystem    Metabolomics      Microbiome   PlasmaLuminex 
#>            9084             264             253             259              31 
#> PlasmaSomalogic    SerumLuminex 
#>             650              31
length(unique(sample_metadata$subjectID))
#> [1] 17

all(rownames(feature_table) == rownames(feature_metadata))
#> [1] TRUE
all(colnames(feature_table) == rownames(sample_metadata))
#> [1] TRUE

# Optional speed-up for local experimentation
# top_n <- 50
# subsetIDs <- c(1:top_n, (nrow(feature_table) - top_n + 1):nrow(feature_table))
# feature_table <- feature_table[subsetIDs, ]
# feature_metadata <- feature_metadata[subsetIDs, ]

Step 2: Build PCL Input

PCL_train <- list(
  feature_table = feature_table,
  sample_metadata = sample_metadata,
  feature_metadata = feature_metadata
)

Step 3: Fit Continuous Model

For this example, we use BART base learners (SL.BART).

If you hit:

java.lang.UnsupportedClassVersionError ... class file version 65.0 ... recognizes up to 61.0

your Java runtime is older than the version used by your installed bartMachine build (typically Java 17 runtime vs Java 21 bytecode). In that case, either:

upgrade Java runtime to 21 and restart R, or
use a non-Java learner (example fallback shown below).

fit <- IntegratedLearner(
  PCL_train = PCL_train,
  folds = 2,
  base_learner = "SL.BART",
  meta_learner = "SL.nnls.auc",
  filter_method = "variance",
  filter_pct = 40,
  run_screening = TRUE,
  screen_pct = 30,
  family = gaussian()
)
#> Time for model fit : 0.5 minutes 
#> ========================================
#> Model fit for individual layers: SL.BART 
#> Model fit for stacked layer: SL.nnls.auc 
#> Model fit for concatenated layer: SL.BART 
#> ========================================
#> R^2 for training data: 
#> Individual layers: 
#>     CellfreeRNA    ImmuneSystem    Metabolomics      Microbiome   PlasmaLuminex 
#>     0.095426974     0.048755007     0.447133256     0.450236277     0.113616776 
#> PlasmaSomalogic    SerumLuminex 
#>     0.722966170     0.003965187 
#> ======================
#> Stacked model:0.7166626 
#> ======================
#> Concatenated model:0.1764758 
#> ======================
#> ========================================
#> Weights for individual layers predictions in IntegratedLearner: 
#>     CellfreeRNA    ImmuneSystem    Metabolomics      Microbiome   PlasmaLuminex 
#>           0.000           0.000           0.000           0.017           0.000 
#> PlasmaSomalogic    SerumLuminex 
#>           0.983           0.000 
#> ========================================

Fallback (non-Java) run:

fit <- IntegratedLearner(
  PCL_train = PCL_train,
  folds = 2,
  base_learner = "SL.randomForest",
  meta_learner = "SL.nnls.auc",
  filter_method = "variance",
  filter_pct = 40,
  run_screening = TRUE,
  screen_pct = 30,
  family = gaussian()
)

Step 4: Evaluate Predictive Accuracy

For continuous outcomes, IntegratedLearner reports R2.train (and R2.test if validation is provided).

fit$R2.train
#>     CellfreeRNA    ImmuneSystem    Metabolomics      Microbiome   PlasmaLuminex 
#>     0.095426974     0.048755007     0.447133256     0.450236277     0.113616776 
#> PlasmaSomalogic    SerumLuminex         stacked    concatenated 
#>     0.722966170     0.003965187     0.716662576     0.176475837

plot.obj <- IntegratedLearner:::plot.learner(fit)
plot.obj$plot

Step 5: Uncertainty and Feature-Level Interpretation (BART)

When using SL.BART, you can inspect posterior predictive distributions and derive weighted posterior summaries.

weights <- fit$weights

dataX <- fit$X_train_layers
dataY <- fit$Y_train

post.samples <- vector("list", length(weights))
names(post.samples) <- names(dataX)

for (i in seq_along(post.samples)) {
  post.samples[[i]] <- bartMachine::bart_machine_get_posterior(
    fit$model_fits$model_layers[[i]],
    dataX[[i]]
  )$y_hat_posterior_samples
}
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     SSH2, GRB2, AP1S2, HADHA, PARP14, FAM129A, NCK2, HLA.DRA, ARPC5, PRKACB, SIAH2, DAP, STRADB, RPL7AP6, CDC37, SEC14L1, EIF3K, PTEN, DOCK11, CDC42SE1, SYK, TIMP1, IGF2BP3, PSAP, LUC7L3, PARP1, HNRNPR, ABTB1, RNA5SP370, NFATC2, MBP, CBX3, KIF1C, HNRNPH3, LARP1, RNA5SP74, BAZ1A, POLDIP2, LIMD2, SHOC2, AURKAIP1, PRPF6, HLA.DPB1, WAS, QARS, JUND, ANP32E, HINT1, GSN, CLTC, KLF2, RAB10, TBCA, CEBPD, PHF3, CHCHD2, NFKB2, H3F3A, BAZ2A, IQGAP2, HOOK3, CTSB, LCN2, RNA5SP368, RBBP4, ITSN2, EIF3F, ZNF385A, UIMC1, LBH, TAX1BP1, DTX3L, NUP155, APC, DENND4A, RASA3, KLF6, AP3S1, SNHG5, ARL6IP5, N4BP2L2, ANXA11, RP11.475C16.1, UBQLN1, NCOA3, HIST1H2BD, NLRC5, NIN, UQCRC1, MYD88, COL1A2, HERC1, MCUR1, RPS27L, SASH3, NDST1, HBD, ZNF106, ZNF154, LPCAT3, CLCN3, ZNF747, ARID2, CTRC, LIG1, CCDC181, ADRA2B, RP11.319G6.1, SUMO2, RP11.89H19.1, ATP6V1F, DUX4L26, EME1, SP110, ABCA3, ATXN3, ABI1, LURAP1L, CTA.212A2.3, SYNPO2, PRPF40A, XPO1, CNTROB, PCM1, RP11.255B23.1, UQCC2, EEF1A1P6, FBRS, NEMF, GATAD2A, ZCCHC9, CALD1, AIM2, STK24, ATP5G2, RUNX1T1, RPL23AP74, TOMM7, PECAM1, NDUFS3, EIF4H, ELK4, TMCC1, ZNF271P, UBAP2L, APOL6, RPL39P3, ILF3, POLR3G, BTG1, EHD1, FAM107A, RPL10P3, GABARAP, CUL4A, CST3, UBXN6, SSH1, NOP56, MAP3K1, QKI, RNU1.13P, X5.Sep, TSEN15, ARHGAP25, TPT1P4, SCARNA13, TBL1XR1, STAT3, HECTD4, DDX11L10, TREML1, WHSC1L1, MAP3K5, IMPDH2, GUCY1B3, TRIM22, MORF4L1, RYBP, RNU1.89P, UBE2L6, PRRC2B, BAG1, AC074289.1, X2.Sep, HMGA1, AFF1, GLS, FRMD4B, SPX, ILK, CSNK1G1, MTCO3P12, FBXO9, ACRBP, CAPN15, CTNNBL1, RNA5SP325, DYSF, RAB2B, PPP1CA, TPI1, C11orf58, APLF, ERG, EIF5, SUPT16H, MAP4, DYNC1H1, ACVR2A, RN7SL493P, MAF1, PTK2B, AP2B1, CHD6, GMFG, SRSF6, CSNK1A1, GNA13, RALY, PIM1, SUSD1, RNU5B.1, SMC1A, COMMD6, BTK, ASH1L, ABCC3, JAK2, CANX, RPL23AP7, ISCA1, ANXA5, SIN3A, TMEM40, UHMK1, NET1, VAPB, RAC1, MLH3, XRCC6, PLEKHA2, AP2A1, EPS15, RPS11P5, BAZ1B, HDAC5, SLC44A2, RPL10AP6, SNORD89, EPSTI1, DCUN1D1, PDS5A, MLX, CAPN1, USP9X, USP34, DNM2, YPEL3, GNAQ, HIST1H4C, TCF25, TMOD3, KIAA0930, CALCOCO2, EFCAB13, PTPN11, SUPT6H, MAP7D1, CD300E, DBNL, ARHGAP10, NUDT3, WDFY4, PRELID1, THOC2, BASP1, EIF4EBP2, GRINA, SQSTM1, SRSF11, PARK7, NCOA2, HCK, MTND1P23, CENPF, RBMX, USP7, COX6B1, GRK6, MPEG1, OGFR, ZFP36L2, VTI1B, PPP4C, COL6A3, ASH2L, FGR, ASCC2, SDCBP, ATP2A3, ADRBK2, HLA.DPA1, CAT, PPIG, SYNE1, BECN1, RREB1, ABCC4, UBALD2, ARL8B, FAM101B, HIGD1A, SEC31A, MINK1, SLC25A3, RAB37, TBCEL, MIER1, JAK3, PDCD10, FURIN, RBM3, SSFA2, MKNK2, FAM104A, PLCB2, TNFAIP2, GCA, ETFA, APBB1IP, MTND5P11, STK40, DNAJB6, ZFR, KHDRBS1, SRCAP, SNRNP200, C19orf53, DPYSL2, RNF111, AGO2, UACA, RANBP9, CNTRL, JMJD1C, GPBP1L1, ARHGAP26, FLII, CLIC1, SMG1, STAT6, UBTF, DOCK10, H2AFY, PNN, SP1, C12orf75, EEF1B2P3, DAAM1, MCTP1, BNIP2, DNTTIP2, PRPF8, FCER1G, SUPT5H, HLA.C, NRDC, H1F0, SNRPC, ATP5D, MPZ, CSNK1G3, LYST, COX6C, H1FX, RMND5A, CASP1, UBLCP1, TAB2, PLCG1, GRK5, GIT2, CREB3, SNORA14B, POLR1D, SYF2, CHMP2A, PSME2, LDHA, RABEP1, GLRX5, RN7SL381P, RIC8A, SMOX, RUNX1, WDR60, STAU1, PITPNM1, DBI, ZNF438, TUBB4B, ZNF699, GIMAP6, CALR, ZDHHC14, COX14, PHRF1, NFIL3, ZFP36, SYNCRIP, SERP1, RNF144A.AS1, PSD4, DENND3, DNAJC2, NUP214, HEMK1, S100A12, ARID4B, PABPC4, CAB39, AP003068.23, G3BP2, EIF4G1, ARHGAP17, ABI3, NBPF15, METAP2, PRKDC, SH3TC1, KDM7A, PTBP3, CCT3, NAA60, PKN1, BIRC6, PPP1R15A, SLA, ITGB2, RAP1A, ELOVL7, SENP6, BLOC1S6, ATXN2, VCAN, RAB11FIP1, NCF1, ARPP19, IDH2, CTNNA1, RASGRP2, GP9, PLXDC2, ANXA3, C9orf16, SAFB2, ACAP2, PIK3C3, CELF2, RPL36A, ZBTB20, OAS1, MAP2K2, FAM120A, HGS, HCFC1, EIF4E2, ATP5A1, MFN2, TBC1D1, AGO1, CCDC88C, GNB2, PSIP1, VAMP3, UBE2B, GMPR, LRRFIP2, CCNY, RPL7AP30, FOXP1, ZCCHC6, G6PD, SLK, FAM192A, GOLGB1, PPP1R12C, ZER1, ABLIM1, HSPA4, FBL, BCL6, RSF1, KCTD12, NFAT5, RBM8A, DDX46, FKBP5, PIK3CD, DGKD, SMAD2, ATG3, CTSG, EIF4E, EHD3, PA2G4, HIST1H1B, ZFAS1, EXOC6B, ROCK2, TLE3, SNHG9, SBNO1, RAB8B, CTDSP2, YLPM1, LGALS1, CLIC4, WAPL, MGAT4B, RP11.832N8.1, PPP2CA, CST7, CCNDBP1, TAF3, HECA, MGEA5, MTCO2P12, KCTD20, ARID1B, C7orf73, RPL23AP2, CCT2, RBM5, SRSF4, DCK, ZNF609, MRPL48, PTK2, MYO18A, FCHSD2, RTCA, EPB41L2, CIC, TANK, LEF1, USP25, TMEM140, C1orf162, ANKRD44, RN7SL7P, DRAP1, KDM5A, IRF8, WDR44, NOLC1, VPS37B, MTCO1P40, HIST1H2AG, RN7SL630P, RPS19P1, UBR4, AZIN1, RPS15AP1, HNRNPA1P48, PRDM2, SLC2A4RG, PHB2, PIK3R1, CIZ1, RTF1, CTB.63M22.1, TRRAP, RNU6ATAC2P, ITFG2, GOLGA4, MTRNR2L9, UBASH3B, DYRK1A, PHF11, NDUFB9, PHF14, ATP5B, MKL1, TMOD1, STARD7, ARHGEF2, RERE, HIST1H2AL, LGALSL, CLINT1, EIF3I, NFIX, PDAP1, VPS13C, CASC4, CARD11, SNIP1, RCAN3, PGK1, NFE2, ACLY, SORL1, CPEB4, NECAP2, MKI67, ZNF91, USP15, LDHB, BICD2, SC22CB.1E7.1, UBE2J1, XRN2, FAM32A, PRCC, TRAM1, RAB4A, G3BP1, TNRC6B, KIAA0513, NFATC3, BBX, GOLIM4, BIRC3, CSNK1G2, TCF3, MITD1, ARF6, CAMP, PLA2G12A, EIF3M, TCERG1, GPATCH4, RANBP1, VDAC3, VAMP8, SAFB, NPM1P27, RP11.244J10.1, UBE2Q1, PTPRC, WIPI1, PSME4, LDLRAP1, GYPB, NDUFA6, RGS2, EIF2AK2, TRIM44, RBL2, VCP, FAM63A, CHMP7, DOCK5, GPSM3, KDM3B, BLVRB, SLC25A5, PLCG2, DNMT3A, SIPA1, OIP5.AS1, ALOX12, STX7, EML4, EXOC3, IGF2BP2, PHKB, U2AF2, FTH1P8, CIRBP, POLR3GL, BCLAF1, XRN1, SPN, SMARCA4, ZMYND8, MTRNR2L4, CARD8, GIT1, GOLGA3, CDKN2D, SRRT, HSPB1, MPP7, PITPNM2, AFF4, TMSB4XP1, SATB1, CCND2, SSB, HELZ, RASSF5, PNISR, TUFM, CAPN2, TGOLN2, IL32, GSTP1, NCF1B, UXT, EFR3A, CPNE2, CD22, DICER1, CYBA, PUM2, NEK1, IL6ST, ASPH, ARHGAP4, UGGT1, MYO1G, HNRNPDL, NUDT4, HIVEP2, FBXO41, TNS3, PANK3, GSTK1, CYTIP, POLR2J, NUTF2, FLNB, SHKBP1, SEPP1, SH3BP2, GBP1, DCTN1, CTA.414D7.1, TSR2, KARS, TACC1, FGFR1OP2, FAM228B, STAT5B, HIBADH, VAV1, UBR2, RP11.20O24.4, CSTA, CASC5, SCUBE1, MAPRE1, PYGL, SETD3, USP47, WDFY1, SNHG6, PSG1, ZMIZ1, COPA, SERPINE1, COMMD4, MDN1, TAF10, PPP4R3B, CHM, COPE, CDK2AP1, TFPI, GMIP, ENDOD1, TJP2, SREK1, MADD, USP22, YY1, CD247, SH2B3, SNHG25, RPL7P9, BROX, SOD1, IKZF3, VPS13A, FGL2, KRT1, NDUFS5, MTSS1, BRD2, RNF115, PSMD8, RNF20, TESPA1, SUZ12, RNU6.14P, HIST1H4E, ATXN2L, RAB1B, XPO7, X11.Sep, SBF2, CBL, EEF1A1P13, CTD.3035D6.1, CBX1, MGLL, EIF4ENIF1, CRBN, RPARP.AS1, PSMD4, SCARNA5, DHX9, HBS1L, PABPN1, RP11.408H1.3, RRP7BP, NPEPL1, SRP68, CTA.243E7.1, UFD1L, FUS, X7.Mar, CYTH4, WDR70, PRKACA, MAST3, STXBP2, RPL13AP7, SAP18, NRBF2, ASAP2, PPP2R1A, CTNND1, C10orf10, CCND1, TNRC6C, HIST1H3G, TTLL5, JARID2, NAPA, JAML, RPRD2, ONECUT3, ANKRD36BP2, PRKCD, PPP4R3A, FUBP1, ZNF652, RELT, FAM126A, PACSIN2, UBE3B, PRRC2A, SENP2, AGTPBP1, SRSF5, C14orf166, SVIP, TROVE2, IGBP1, CNPY3, UNC13D, CDKN1A, PPP6R3, PELP1, PAX8.AS1, RHOG, HSP90B1, KIAA1644, RN7SL280P, SAMD9, TMEM161A, STAB1, EIF2S2, PASK, FCF1, PLCL2, PSENEN, OTUD5, STK38, HLA.DRB1, PEG3, MEIS1, CHST6, RAB29, RP11.36C20.1, RIMS3, GLIPR1, GIMAP1, GUCD1, TAF7, RP11.84C10.4, RRNAD1, DNAJB9, GATB, OSBPL3, NRG1, BRSK2, LRRC7, RHBDL2, AC226118.1, PKHD1, AF013593.1, SMARCE1P1, RSPH4A, OSM, WDR82, SCRT2, RBMX2, CNPY2, ATP6V1E2, KIFC2, CACNG8, BMP2K, BAHCC1, TGFBR1, BTN2A3P, A2M, KLHL36, RNF40, NLK, CNNM2, METTL22, SIDT1, CDC14B, MFSD1, PKNOX1, UEVLD, TIGAR, KHSRP, POLA1, SART1, DNAJC3, CLTA, FMR1, ACTR3B, RP11.632K20.7, ZNF292, RBM6, ARRDC4, HELB, RAP2B, PEA15, LSM14A, APEX1, PHF20L1, MMP8, CCT6A, POLR2L, RAN, PARP4, HNRNPAB, AC090498.1, TSPAN33, IPO5, FNDC3B, PCF11, USP10, STRN3, FXR1, UBE2D2, HIST1H4D, ASXL2, RALB, CARD16, PADI4, ARHGAP9, ORAI2, TBC1D5, FTH1P20, FOXO4, SMC3, OGFRL1, YWHAG, ATP6V1G1, LPP, SSR3, MED13, UBA1, UBXN2A, RP3.417G15.1, TCEA1, GAB2, TRA2B, RPL24P4, PDGFA, PARD3, MAGI2.AS3, CHD8, TADA3, SLA2, CDC27, RPL5P34, IGKC, MDH2, MAP3K2, TCEB1, THEMIS2, ZCCHC7, CCDC175, MGA, RP11.69L16.5, AC098614.2, HIF1A, SNX1, MRPS34, ZCCHC11, COMMD7, DYNLL2, KIAA0430, RIN3, HIST1H4L, RNY4P25, PRR12, OSTF1, SCYL2, ZC3H11A, SUMO3, LRMP, WBP11, ARCN1, AKIRIN2, BIRC2, PHACTR2, NEDD9, HIST1H3C, KXD1, RANBP2, UBE2K, HAX1, MBOAT2, PHACTR4, PSTPIP2, TNFAIP8, UBR5, ATPIF1, ARHGEF6, HTT, CLEC1B, TRAP1, C1orf198, ELK3, PARVG, AC079250.1, R3HDM1, MGRN1, MPRIP, HMG20B, VPS41, UBA2, ZFAND6, RPGR, CRKL, VRK1, TMEM50A, PSMA7, RC3H2, RIT1, PARP8, USP33, USF3, CDYL, U2SURP, FCGR3A, ITCH, BCL2A1, YWHAQ, GON4L, DDX27, SVIL, DNAJC8, BST2, MTMR12, ZNF629, BRK1, HECT
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     Gr_MAPKAPK2_LPS100, CD4.Tcells_mem_STAT3_Unstim, mDCs_STAT3_IFNa100, Tbet.CD8.Tcells_naive_STAT3_Unstim, Tbet.CD4.Tcells_mem_STAT3_IL100, pDCs_STAT3_IFNa100, Tbet.CD4.Tcells_mem_STAT3_IFNa100, mDCs_ERK_LPS100, CD4.Tcells_STAT3_Unstim, Tbet.CD8.Tcells_mem_STAT3_Unstim, cMCs_STAT3_IFNa100, mDCs_STAT3_Unstim, CD4.Tcells_naive_STAT3_IL100, M.MDSC_STAT3_IFNa100, CD4.Tcells_mem_STAT3_IL100, CD4.Tcells_STAT3_IL100, pDCs_STAT3_Unstim, CD7.NKcells_STAT3_Unstim, CD8.Tcells_mem_STAT3_Unstim, CD16.CD56.NKcells_STAT3_Unstim, intMCs_MAPKAPK2_Unstim, TCRgd.Tcells_STAT3_Unstim, CD8.Tcells_STAT3_Unstim, CD4.Tcells_mem_STAT3_IFNa100, Tbet.CD4.Tcells_naive_STAT5_IFNa100, M.MDSC_STAT3_Unstim, CD8.Tcells_naive_STAT3_Unstim, cMCs_STAT3_Unstim, intMCs_STAT3_Unstim, CD4.Tcells_naive_STAT3_Unstim, ncMCs_ERK_Unstim, M.MDSC_p38_LPS100, Bcells_STAT3_Unstim, mDCs_STAT1_IL100, CD4.Tcells_STAT3_IFNa100, CD8.Tcells_naive_STAT3_IFNa100, Tregs_STAT3_IL100, ncMCs_STAT3_Unstim, cMCs_STAT1_IL100, mDCs_p38_LPS100, CD45RA.Tregs_STAT3_IL100, CD45RA.Tregs_STAT3_Unstim.1, intMCs_p38_Unstim, Tregs_STAT3_Unstim, Tbet.CD8.Tcells_naive_STAT1_IFNa100, TCRgd.Tcells_STAT3_IFNa100, Tbet.CD8.Tcells_naive_STAT3_IFNa100, Bcells_CREB_Unstim, CD8.Tcells_STAT3_IL100, CD45RA.Tregs_STAT3_IL100.1, ncMCs_STAT3_IL100, M.MDSC_STAT1_IL100, cMCs_p38_LPS100, CD4.Tcells_naive_STAT3_IFNa100, CD8.Tcells_STAT1_IFNa100, CD8.Tcells_STAT3_IFNa100, intMCs_STAT1_IL100, M.MDSC_p38_Unstim, Tbet.CD8.Tcells_mem_STAT3_IFNa100, CD8.Tcells_mem_STAT3_IFNa100, M.MDSC_ERK_IL100, Tbet.CD4.Tcells_naive_STAT5_IL100, ncMCs_CREB_LPS100, CD8.Tcells_naive_STAT1_IL100, Tbet.CD4.Tcells_mem_STAT1_IFNa100, cMCs_ERK_IL100, CD45RA.Tregs_STAT3_IFNa100.1, intMCs_CREB_LPS100, ncMCs_ERK_IL100, TCRgd.Tcells_STAT1_IFNa100, Tbet.CD4.Tcells_naive_STAT5_Unstim, Tregs_STAT3_IFNa100, intMCs_NFkB_LPS100, cMCs_p38_Unstim
#>   These features will be ignored during prediction.
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     Hydroxyzileuton.Zileuton.sulfoxide, PE.16.0.22.6.4Z.7Z.10Z.13Z.16Z.19Z...PE.16.1.9Z..22.5.4Z.7Z.10Z.13Z.16Z...PE.16.1.9Z..22.5.7Z.10Z.13Z.16Z.19Z...PE.18.1.11Z..20.5.5Z.8Z.11Z.14Z.17Z...PE.18.1.9Z..20.5.5Z.8Z.11Z.14Z.17Z...PE.18.2.9Z.12Z..20.4.5Z.8Z.11Z.14Z...PE.18.2.9Z.12Z..20.4.8Z.11Z.14Z.17Z...PE.18.3.6Z.9Z.12Z..20.3.5Z.8Z.11Z...PE.18.3.6Z.9Z.12Z..20.3.8Z.11Z.14Z...PE.18.3.9Z.12Z.15Z..20.3.5Z.8Z.11Z...PE.18.3.9Z.12Z.15Z..20.3.8Z.11Z.14Z...PE.18.4.6Z.9Z.12Z.15Z..20.2.11Z.14Z...PE.20.2.11Z.14Z..18.4.6Z.9Z.12Z.15Z...PE.20.3.5Z.8Z.11Z..18.3.6Z.9Z.12Z...PE.20.3.5Z.8Z.11Z..18.3.9Z.12Z.15Z...PE.20.3.8Z.11Z.14Z..18.3.6Z.9Z.12Z...PE.20.3.8Z.11Z.14Z..18.3.9Z.12Z.15Z...PE.20.4.5Z.8Z.11Z.14Z..18.2.9Z.12Z...PE.20.4.8Z.11Z.14Z.17Z..18.2.9Z.12Z...PE.20.5.5Z.8Z.11Z.14Z.17Z..18.1.11Z...PE.20.5.5Z.8Z.11Z.14Z.17Z..18.1.9Z...PE.22.5.4Z.7Z.10Z.13Z.16Z..16.1.9Z...PE.22.5.7Z.10Z.13Z.16Z.19Z..16.1.9Z...PE.22.6.4Z.7Z.10Z.13Z.16Z.19Z..16.0..1, Risedronate.Risedronate, Betaine.L.Valine.Vaporole.N.Methyl.a.aminoisobutyric.acid.5.Aminopentanoic.acid.Norvaline.Amyl.Nitrite.Norvaline.....Valine.L.Valine.Amyl.Nitrite.N.N.Dimethyl.L.Alanine.2.Amino.Pentanoic.Acid.D.Isovaline.Norvaline, Inosine.2..3..cyclic.phosphate..Malathion.Blighinone.2.3.Di.O.methylellagic.acid.2.8.Di.O.methylellagic.acid.Malathion.Malathion, Tauroursodeoxycholic.acid.Taurodeoxycholic.acid.Taurochenodesoxycholic.acid.Tauroursodeoxycholic.acid.2, LysoPC.18.1.9Z...LysoPC.18.1.11Z...2, Valdecoxib.Valdecoxib, Potassium.asulam, Tiapride, L.Acetylcarnitine.N..ethoxycarbonyl..L.leucine.2..ACETYL.HYDROXY.AMINO..4.METHYL.PENTANOIC.ACID.METHYL.ESTER, Loratadine.Loratadine.1, Valdecoxib.Valdecoxib.2, Loratadine.Loratadine, PC.18.0.22.6.4Z.7Z.10Z.13Z.16Z.19Z...PC.18.1.11Z..22.5.4Z.7Z.10Z.13Z.16Z...PC.18.1.11Z..22.5.7Z.10Z.13Z.16Z.19Z...PC.18.1.9Z..22.5.4Z.7Z.10Z.13Z.16Z...PC.18.1.9Z..22.5.7Z.10Z.13Z.16Z.19Z...PC.18.2.9Z.12Z..22.4.7Z.10Z.13Z.16Z...PC.18.4.6Z.9Z.12Z.15Z..22.2.13Z.16Z...PC.20.1.11Z..20.5.5Z.8Z.11Z.14Z.17Z...PC.20.2.11Z.14Z..20.4.5Z.8Z.11Z.14Z...PC.20.2.11Z.14Z..20.4.8Z.11Z.14Z.17Z...PC.20.3.5Z.8Z.11Z..20.3.5Z.8Z.11Z...PC.20.3.5Z.8Z.11Z..20.3.8Z.11Z.14Z...PC.20.3.8Z.11Z.14Z..20.3.5Z.8Z.11Z...PC.20.3.8Z.11Z.14Z..20.3.8Z.11Z.14Z...PC.20.4.5Z.8Z.11Z.14Z..20.2.11Z.14Z...PC.20.4.8Z.11Z.14Z.17Z..20.2.11Z.14Z...PC.20.5.5Z.8Z.11Z.14Z.17Z..20.1.11Z...PC.22.2.13Z.16Z..18.4.6Z.9Z.12Z.15Z...PC.22.4.7Z.10Z.13Z.16Z..18.2.9Z.12Z...PC.22.5.4Z.7Z.10Z.13Z.16Z..18.1.11Z...PC.22.5.4Z.7Z.10Z.13Z.16Z..18.1.9Z...PC.22.5.7Z.10Z.13Z.16Z.19Z..18.1.11Z...PC.22.5.7Z.10Z.13Z.16Z.19Z..18.1.9Z...PC.22.6.4Z.7Z.10Z.13Z.16Z.19Z..18.0., Ethyl.glucuronide, Dehydroepiandrosterone.sulfate.Testosterone.sulfate.Epitestosterone.sulfate.dehydroepiandrosterone.sulfate.2, Malaoxon.Rofecoxib, LysoPC.16.0., LysoPC.18.0..LysoPC.0.0.18.0..Platelet.Activating.Factor.2, X2.Methyl.3.ketovaleric.acid.3.Methyl.2.oxovaleric.acid.Ketoleucine.2.Ketohexanoic.acid.Mevalonolactone.3.Oxohexanoic.acid.Adipate.semialdehyde.5.Ethoxy.4.5.dihydro.2.3H.furanone.Ethyl.acetoacetate.Sherry.lactone..4S.6S..3.4.5.6.Tetrahydro.4.hydroxy.6.methyl.2H.pyran.2.one.Acetoin.acetate.Methyl.levulinate.Pantolactone.Ethyl.3.oxobutanoate.2.Oxo.4.Methylpentanoic.Acid.3.Methyl.2.oxovaleric.acid, PC.14.0.22.6.4Z.7Z.10Z.13Z.16Z.19Z...PC.14.1.9Z..22.5.4Z.7Z.10Z.13Z.16Z...PC.14.1.9Z..22.5.7Z.10Z.13Z.16Z.19Z...PC.16.1.9Z..20.5.5Z.8Z.11Z.14Z.17Z...PC.18.2.9Z.12Z..18.4.6Z.9Z.12Z.15Z...PC.18.3.6Z.9Z.12Z..18.3.6Z.9Z.12Z...PC.18.3.6Z.9Z.12Z..18.3.9Z.12Z.15Z...PC.18.3.9Z.12Z.15Z..18.3.6Z.9Z.12Z...PC.18.3.9Z.12Z.15Z..18.3.9Z.12Z.15Z...PC.18.4.6Z.9Z.12Z.15Z..18.2.9Z.12Z...PC.20.5.5Z.8Z.11Z.14Z.17Z..16.1.9Z...PC.22.5.4Z.7Z.10Z.13Z.16Z..14.1.9Z...PC.22.5.7Z.10Z.13Z.16Z.19Z..14.1.9Z...PC.22.6.4Z.7Z.10Z.13Z.16Z.19Z..14.0..2, Citric.acid.Isocitric.acid.D.threo.Isocitric.acid.Diketogulonic.acid.2.3.Diketo.L.gulonate..1R.2R..Isocitric.acid.D.Glucaro.1.4.lactone.Isocitric.Acid.4.Deoxyglucarate.Citric.Acid.1, X4..6.CHLORO.2.4.DIOXO.1.2.3.4.TETRAHYDROPYRIMIDIN.5.YL..BUTYL.PHOSPHATE, Edetic.Acid.Edetic.Acid.2, Indoxyl.sulfate.3.SULFOOXY.1H.INDOLE, Rofecoxib, PC.15.0.18.4.6Z.9Z.12Z.15Z...PC.18.4.6Z.9Z.12Z.15Z..15.0..PE.14.0.22.4.7Z.10Z.13Z.16Z...PE.16.0.20.4.5Z.8Z.11Z.14Z...PE.16.0.20.4.8Z.11Z.14Z.17Z...PE.16.1.9Z..20.3.5Z.8Z.11Z...PE.18.0.18.4.6Z.9Z.12Z.15Z...PE.18.1.11Z..18.3.6Z.9Z.12Z...PE.18.1.11Z..18.3.9Z.12Z.15Z...PE.18.1.9Z..18.3.6Z.9Z.12Z...PE.18.1.9Z..18.3.9Z.12Z.15Z...PE.18.2.9Z.12Z..18.2.9Z.12Z...PE.18.3.6Z.9Z.12Z..18.1.11Z...PE.18.3.6Z.9Z.12Z..18.1.9Z...PE.18.3.9Z.12Z.15Z..18.1.11Z...PE.18.3.9Z.12Z.15Z..18.1.9Z...PE.18.4.6Z.9Z.12Z.15Z..18.0..PE.20.3.5Z.8Z.11Z..16.1.9Z...PE.20.3.8Z.11Z.14Z..16.1.9Z...PE.20.4.5Z.8Z.11Z.14Z..16.0..PE.20.4.8Z.11Z.14Z.17Z..16.0..PE.22.4.7Z.10Z.13Z.16Z..14.0..1, X4..3..4.FLUOROPHENYL..1H.PYRAZOL.4.YL.PYRIDINE.3..4.fluorophenyl..5.phenyl.4H.1.2.4.triazole, Edetic.Acid.Edetic.Acid.8, Serinyl.Valine.Valyl.Serine.N6.Acetyl.5S.hydroxy.L.lysine.3.4.Dihydroxy.2.hydroxymethyl.1.pyrrolidinepropanamide..2r.3r.4s.5r..2.Acetamido.3.4.Dihydroxy.5.Hydroxymethyl.Piperidinium.N.6..Carboxymethyllysine.1, Pantetheine.4..phosphate.4..Phosphopantetheine.4, Hypoxanthine.Allopurinol.1.Pentanesulfenothioic.acid.Ethyl.propyl.disulfide.Ethyl.isopropyl.disulfide.Allopurinol.3h.Pyrazolo.4.3.D.Pyrimidin.7.Ol.1, PC.15.0.18.4.6Z.9Z.12Z.15Z...PC.18.4.6Z.9Z.12Z.15Z..15.0..PE.14.0.22.4.7Z.10Z.13Z.16Z...PE.16.0.20.4.5Z.8Z.11Z.14Z...PE.16.0.20.4.8Z.11Z.14Z.17Z...PE.16.1.9Z..20.3.5Z.8Z.11Z...PE.18.0.18.4.6Z.9Z.12Z.15Z...PE.18.1.11Z..18.3.6Z.9Z.12Z...PE.18.1.11Z..18.3.9Z.12Z.15Z...PE.18.1.9Z..18.3.6Z.9Z.12Z...PE.18.1.9Z..18.3.9Z.12Z.15Z...PE.18.2.9Z.12Z..18.2.9Z.12Z...PE.18.3.6Z.9Z.12Z..18.1.11Z...PE.18.3.6Z.9Z.12Z..18.1.9Z...PE.18.3.9Z.12Z.15Z..18.1.11Z...PE.18.3.9Z.12Z.15Z..18.1.9Z...PE.18.4.6Z.9Z.12Z.15Z..18.0..PE.20.3.5Z.8Z.11Z..16.1.9Z...PE.20.3.8Z.11Z.14Z..16.1.9Z...PE.20.4.5Z.8Z.11Z.14Z..16.0..PE.20.4.8Z.11Z.14Z.17Z..16.0..PE.22.4.7Z.10Z.13Z.16Z..14.0., Edetic.Acid.Edetic.Acid.7, X....Epigallocatechin.3.p.coumaroate.3, X.5r.6s.7s.8s..5.Hydroxymethyl.6.7.8.Trihydroxy.Tetrazolo.1.5.a.Piperidine.Nojirimycine.Tetrazole.2, PC.18.1.9Z..18.1.9Z....PC.14.0.22.2.13Z.16Z...PC.14.1.9Z..22.1.13Z...PC.16.0.20.2.11Z.14Z...PC.16.1.9Z..20.1.11Z...PC.18.0.18.2.9Z.12Z...PC.18.1.11Z..18.1.11Z...PC.18.1.11Z..18.1.9Z...PC.18.1.9Z..18.1.11Z...PC.18.2.9Z.12Z..18.0..PC.20.1.11Z..16.1.9Z...PC.20.2.11Z.14Z..16.0..PC.22.1.13Z..14.1.9Z...PC.22.2.13Z.16Z..14.0..1, PC.15.0.18.2.9Z.12Z...PC.18.2.9Z.12Z..15.0..PE.14.0.22.2.13Z.16Z...PE.14.1.9Z..22.1.13Z...PE.16.0.20.2.11Z.14Z...PE.16.1.9Z..20.1.11Z...PE.18.0.18.2.9Z.12Z...PE.18.1.11Z..18.1.11Z...PE.18.1.11Z..18.1.9Z...PE.18.1.9Z..18.1.11Z...PE.18.1.9Z..18.1.9Z...PE.18.2.9Z.12Z..18.0..PE.20.1.11Z..16.1.9Z...PE.20.2.11Z.14Z..16.0..PE.22.1.13Z..14.1.9Z...PE.22.2.13Z.16Z..14.0..3, X2.Methylbutyrylglycine.Isovalerylglycine.Valerylglycine.N.Acetylvaline.3.Dehydrocarnitine.5.Acetamidovalerate.4.Hydroxystachydrine.Turicine.Betonicine.Calystegine.A6.Calystegine.A7.Calystegin.A3.Medicanine.Methyl.5..hydroxymethyl.pyrrolidine.3.carboxylate.1.Amino.2.3.Dihydroxy.5.Hydroxymethyl.Cyclohex.5.Ene.1, PE.20.4.5Z.8Z.11Z.14Z..P.18.1.11Z...PE.20.4.5Z.8Z.11Z.14Z..P.18.1.9Z...PE.20.4.8Z.11Z.14Z.17Z..P.18.1.11Z...PE.20.4.8Z.11Z.14Z.17Z..P.18.1.9Z...PE.20.5.5Z.8Z.11Z.14Z.17Z..P.18.0..PE.22.5.4Z.7Z.10Z.13Z.16Z..P.16.0..PE.22.5.7Z.10Z.13Z.16Z.19Z..P.16.0..PE.P.16.0.22.5.4Z.7Z.10Z.13Z.16Z...PE.P.16.0.22.5.7Z.10Z.13Z.16Z.19Z...PE.P.18.0.20.5.5Z.8Z.11Z.14Z.17Z...PE.P.18.1.11Z..20.4.5Z.8Z.11Z.14Z...PE.P.18.1.11Z..20.4.8Z.11Z.14Z.17Z...PE.P.18.1.9Z..20.4.5Z.8Z.11Z.14Z...PE.P.18.1.9Z..20.4.8Z.11Z.14Z.17Z.., PC.15.0.20.4.5Z.8Z.11Z.14Z...PC.15.0.20.4.8Z.11Z.14Z.17Z...PC.20.4.5Z.8Z.11Z.14Z..15.0..PC.20.4.8Z.11Z.14Z.17Z..15.0..PE.16.0.22.4.7Z.10Z.13Z.16Z...PE.16.1.9Z..20.3.8Z.11Z.14Z...PE.18.0.20.4.5Z.8Z.11Z.14Z...PE.18.0.20.4.8Z.11Z.14Z.17Z...PE.18.1.11Z..20.3.5Z.8Z.11Z...PE.18.1.11Z..20.3.8Z.11Z.14Z...PE.18.1.9Z..20.3.5Z.8Z.11Z...PE.18.1.9Z..20.3.8Z.11Z.14Z...PE.18.2.9Z.12Z..20.2.11Z.14Z...PE.18.3.6Z.9Z.12Z..20.1.11Z...PE.18.3.9Z.12Z.15Z..20.1.11Z...PE.18.4.6Z.9Z.12Z.15Z..20.0..PE.20.0.18.4.6
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     VaginalSwab_Prevotella_7.2, Stool_Ezakiella, Stool_Prevotella_7.2, Stool_Prevotella_7.1, VaginalSwab_Haemophilus, Saliva_Prevotella_7, ToothGum_Prevotella_7, Saliva_Alloprevotella.2, ToothGum_Alloprevotella.2, Stool_Haemophilus.1, VaginalSwab_Alloprevotella.2, Saliva_Fusobacterium, ToothGum_Fusobacterium, Stool_Streptococcus.2, Stool_Alloprevotella.2, Saliva_Haemophilus.1, ToothGum_Haemophilus.1, VaginalSwab_Fusobacterium, Saliva_Campylobacter, ToothGum_Campylobacter, VaginalSwab_Campylobacter, Saliva_Prevotella_7.1, ToothGum_Prevotella_7.1, VaginalSwab_Prevotella_7.1, VaginalSwab_Prevotella_7, VaginalSwab_Prevotella_6, Stool_Streptococcus.3, Stool_Veillonella.1, Stool_Fusobacterium, ToothGum_Prevotella_6, Saliva_Prevotella_6, Stool_Leptotrichia, Saliva_Streptococcus.3, ToothGum_Streptococcus.3, VaginalSwab_Streptococcus.3, Stool_Prevotella_6, Saliva_Leptotrichia, ToothGum_Leptotrichia, Saliva_Prevotella.11, ToothGum_Prevotella.11, Stool_Leptotrichia.4, VaginalSwab_Prevotella.11, Stool_Campylobacter, VaginalSwab_Leptotrichia, ToothGum_Bacteroides.7, ToothGum_Prevotella.5, Saliva_Bacteroides.7, VaginalSwab_Bacteroides.7, Saliva_Lactobacillus.11, Saliva_Prevotella.5, Stool_Lactobacillus.11, VaginalSwab_Prevotella.5, VaginalSwab_Lactobacillus.11, ToothGum_Finegoldia, ToothGum_Lactobacillus.11, Saliva_Streptococcus.2, ToothGum_Streptococcus.2, Saliva_Finegoldia, VaginalSwab_Bacteroides.1, VaginalSwab_Haemophilus.1, VaginalSwab_Streptococcus.2, Stool_Bacteroides.7, Stool_Ureaplasma, ToothGum_Prevotella.2, Saliva_Prevotella.2, Stool_Gemella, Stool_NA.4, Saliva_Bacteroides.1, ToothGum_Bacteroides.1, Stool_Granulicatella, Stool_Fusobacterium.1, Saliva_Ureaplasma
#>   These features will be ignored during prediction.
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     plasma.LEPTIN, plasma.BDNF, plasma.ICAM1, plasma.RESISTIN, plasma.VCAM1, plasma.RANTES, plasma.CD40L, plasma.IL27, plasma.IL23
#>   These features will be ignored during prediction.
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     FN1.1, NAAA.1, LEP.1, MBL2.1, SELE.1, RTN4.1, CST2, FSTL1.1, TNC.1, HAMP.1, PPIF.1, CDH1.1, SPP1.1, IGF2R.1, SERPINF1.1, CLEC4M.1, CCL11, THBS2.1, SRC.1, F5.1, GAPDH.1, PLG.1, LMAN2.1, PRL.1, TPM4.1, APCS.1, FAM3D.1, TGFBR3.1, FGF19.1, IGFBP7.1, RET.1, C1QA.C1QB..C1QC, PGF.1, C3.5, TGFBI.1, ITIH4.1, KPNB1.1, CFB.1, FTH1.FTL, GPI.1, AFM.1, APOE.1, PI3.1, CFI.1, C3.4, INHBA.1, FABP3.1, ALDOA.1, EIF4H.1, PDGFRB.1, TNFRSF25.1, AURKA.1, NRCAM.1, SLITRK5.1, SERPINA4.1, CA1.1, SPARC.1, CHI3L1.1, FETUB.1, CCL5.1, CMPK1.1, BST1.1, SH2D1A.1, NPPB.2, KLKB1.1, CASP3.1, LCN2.1, DDR2.1, IL22.1, TGFBR2.1, EGFR.1, FGF2.1, C3.6, OCIAD1.1, CCL19.1, A2M.1, TNFRSF11A.1, SFTPD.1, ENO2.1, TFPI.1, IL2RA.1, CHKB.1, ENPP7.1, OLR1.1, SIRPA.1, IL1R1.1, APOM.1, PRSS22.1, MPO.1, GPD1.1, DCTPP1.1, IGFBP1.1, EPHB2.1, EFNB2.1, CST5.1, SNAP25.1, FLT4.1, HIST1H3A.1, TEC.1, KIT.1, MRC1.1, PRKCG.1, PPBP.2, PRKCA.1, SELP.1, CTSA.1, PDGFB.1, SPARCL1.1, ECM1.1, IL1R2.1, CTSD.1, ADSL.1, OMD.1, FLRT2.1, FTCD.1, LYPD3.1, TKT.1, NME2.1, IL2, HSP90AB1.1, CD36.1, MMP12.1, ECE1.1, ASAH2.1, PRKACA.1, IL36A.1, NTRK3.1, CD274.1, IDS.1, SERPINA10.1, CCL15, CDH3.1, PPA1.1, DKK1.1, CCL21.1, ASGR1.1, PKM2, AK1.1, NOTCH1.1, MDK.1, CD55.1, VTA1.1, INSR.1, IL6R.1, LAG3.1, LY9.1, APOB.1, CXCL16.1, CRK.1, AGT.1, PPY, CNDP1.1, CDH2.1, GOT1.1, SLPI.1, FSTL3.1, DIABLO.1, MMP13.1, ALCAM.1, IL18R1.1, CHL1.1, WISP1.1, RARRES2.1, LGALS3.1, PRTN3.1, CCL18, LRIG3.1, PLG.2, KLK8.1, RGMA.1, IL22RA2.1, CD109.1, RAC1.1, APP.1, N6AMT1.1, CDH5.1, MYBPC1.1, PIK3CG.1, NCAM1.1, BMP6.1, MET.1, PPP3R1.1
#>   These features will be ignored during prediction.
#> Warning in pre_process_new_data(clean_data, bart_machine): The following features were found in records for prediction which were not found in the original training data:
#>     serum.BDNF, serum.RESISTIN, serum.RANTES, serum.IL7, serum.CD40L, serum.ENA78, serum.MIP1B, serum.IL1A, serum.VEGF
#>   These features will be ignored during prediction.

weighted.post.samples <- Reduce("+", Map("*", post.samples, weights))
rownames(weighted.post.samples) <- rownames(dataX[[1]])
names(dataY) <- rownames(dataX[[1]])

Visualize 68% and 95% credible intervals for observations:

ord_names <- names(sort(rowMeans(weighted.post.samples), decreasing = TRUE))

mcmc_intervals(t(weighted.post.samples), prob = 0.68, prob_outer = 0.95) +
  scale_y_discrete(limits = ord_names) +
  geom_point(aes(x = dataY[ord_names], y = ord_names), shape = 1, size = 3, color = "black") +
  coord_flip() +
  theme_bw() +
  labs(
    x = "Gestational age (in months)",
    y = "Observations"
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

Layer weights and feature-level inclusion proportions can also be examined for biological interpretation.

omicsEye_theme <- function() {
  angle <- 45
  hjust <- 1
  ggplot2::theme_bw() +
    ggplot2::theme(
      axis.text.x = ggplot2::element_text(size = 8, vjust = 1, hjust = hjust, angle = angle),
      axis.text.y = ggplot2::element_text(size = 8, hjust = 1),
      axis.title = ggplot2::element_text(size = 10),
      plot.title = ggplot2::element_text(size = 10),
      plot.subtitle = ggplot2::element_text(size = 8),
      legend.title = ggplot2::element_text(size = 6, face = "bold"),
      legend.text = ggplot2::element_text(size = 7),
      axis.line = ggplot2::element_line(colour = "black", linewidth = 0.25),
      axis.line.x = ggplot2::element_line(colour = "black", linewidth = 0.25),
      axis.line.y = ggplot2::element_line(colour = "black", linewidth = 0.25),
      panel.border = ggplot2::element_blank(),
      panel.grid.major = ggplot2::element_blank(),
      panel.grid.minor = ggplot2::element_blank()
    )
}

safe_var_importance <- function(model, layer_label) {
  tryCatch({
    qq <- bartMachine::investigate_var_importance(model, plot = FALSE)
    df <- cbind.data.frame(qq$avg_var_props, qq$sd_var_props)
    colnames(df) <- c("mean", "sd")
    df$type <- layer_label
    df
  }, error = function(e) {
    warning(sprintf("Skipping variable importance for %s: %s", layer_label, conditionMessage(e)))
    data.frame(mean = numeric(), sd = numeric(), type = character())
  })
}

vimp_stack <- cbind.data.frame(fit$weights)
colnames(vimp_stack) <- "mean"
vimp_stack$sd <- NA
vimp_stack$type <- "stack"

layer_names <- names(fit$model_fits$model_layers)
vimp_layers <- lapply(layer_names, function(layer_nm) {
  safe_var_importance(fit$model_fits$model_layers[[layer_nm]], layer_nm)
})
#> .....
#> .....
#> .....
#> .....
#> .....
#> .....
#> .....

vimp_layers <- vimp_layers[lengths(vimp_layers) > 0]
vimp_top <- do.call(
  rbind,
  lapply(vimp_layers, function(df) head(df[order(-df$mean), , drop = FALSE], 20))
)

VIMP <- as.data.frame(rbind.data.frame(vimp_stack, vimp_top))
VIMP <- tibble::rownames_to_column(VIMP, "ID")

p4 <- VIMP %>%
  dplyr::filter(type == "stack") %>%
  dplyr::arrange(desc(mean)) %>%
  ggplot(aes(y = mean, x = reorder(ID, -mean))) +
  geom_bar(stat = "identity", fill = "darkseagreen") +
  theme_bw() +
  omicsEye_theme() +
  ylab("Layer Weights") +
  xlab("")

p5 <- VIMP %>%
  dplyr::filter(type != "stack") %>%
  dplyr::arrange(mean) %>%
  dplyr::mutate(ID = stringr::str_replace_all(ID, stringr::fixed("_"), " ")) %>%
  ggplot(aes(reorder(ID, -mean), mean, fill = type)) +
  facet_wrap(. ~ type, scales = "free") +
  geom_bar(stat = "identity", fill = "lightsalmon") +
  geom_errorbar(aes(ymin = ifelse(mean - sd > 0, mean - sd, 0), ymax = mean + sd),
                width = 0.2,
                position = position_dodge(0.9)) +
  theme_bw() +
  coord_flip() +
  omicsEye_theme() +
  theme(strip.background = element_blank()) +
  ylab("Inclusion proportion") +
  xlab("")

plot_grid(
  p4,
  ncol = 1,
  labels = c("Estimated IntegratedLearner Layer Weights"),
  label_size = 8,
  vjust = 0.1
) + theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"))

plot_grid(
  p5,
  ncol = 1,
  labels = c("Top Features by Layer (BART Inclusion Proportions)"),
  label_size = 8,
  vjust = 0.1
) + theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"))

Example 3: Multiclass Outcome (Franzosa MAE with External Validation)

This section shows a full multiclass MAE workflow using packaged local fixtures. Here we keep the original outcome column name (diseaseCat) and subject ID column (sample_id) and pass them through outcome_col and subject_id_col.

load_il_dataset("FranzosaE_2019_CuratedMetabolome", envir = environment())
load_il_dataset("FranzosaE_2019_CuratedMetadata", envir = environment())
load_il_dataset("FranzosaE_2019_CuratedSpeciesProfile", envir = environment())
load_il_dataset("FranzosaE_2019_Validation_CuratedMetabolome", envir = environment())
load_il_dataset("FranzosaE_2019_Validation_CuratedMetadata", envir = environment())
load_il_dataset("FranzosaE_2019_Validation_CuratedSpeciesProfile", envir = environment())

as_feature_matrix <- function(df, id_col = "X") {
  ids <- as.character(df[[id_col]])
  mat <- as.matrix(df[, setdiff(colnames(df), id_col), drop = FALSE])
  storage.mode(mat) <- "numeric"
  rownames(mat) <- ids
  t(mat)
}

prep_sample_metadata <- function(df, id_col = "X") {
  sm <- as.data.frame(df, stringsAsFactors = FALSE)
  sm$sample_id <- as.character(sm[[id_col]])
  rownames(sm) <- sm$sample_id
  sm
}

met_train <- as_feature_matrix(FranzosaE_2019_CuratedMetabolome)
met_valid <- as_feature_matrix(FranzosaE_2019_Validation_CuratedMetabolome)
species_train <- as_feature_matrix(FranzosaE_2019_CuratedSpeciesProfile)
species_valid <- as_feature_matrix(FranzosaE_2019_Validation_CuratedSpeciesProfile)

# Enforce exact train/validation feature alignment per layer.
met_shared <- intersect(rownames(met_train), rownames(met_valid))
species_shared <- intersect(rownames(species_train), rownames(species_valid))
met_train <- met_train[met_shared, , drop = FALSE]
met_valid <- met_valid[met_shared, , drop = FALSE]
species_train <- species_train[species_shared, , drop = FALSE]
species_valid <- species_valid[species_shared, , drop = FALSE]

sm_train <- prep_sample_metadata(FranzosaE_2019_CuratedMetadata)
sm_valid <- prep_sample_metadata(FranzosaE_2019_Validation_CuratedMetadata)

train_ids <- Reduce(intersect, list(colnames(met_train), colnames(species_train), rownames(sm_train)))
valid_ids <- Reduce(intersect, list(colnames(met_valid), colnames(species_valid), rownames(sm_valid)))

met_train <- met_train[, train_ids, drop = FALSE]
met_valid <- met_valid[, valid_ids, drop = FALSE]
species_train <- species_train[, train_ids, drop = FALSE]
species_valid <- species_valid[, valid_ids, drop = FALSE]
sm_train <- sm_train[train_ids, , drop = FALSE]
sm_valid <- sm_valid[valid_ids, , drop = FALSE]

class_levels <- sort(unique(as.character(sm_train$diseaseCat)))
sm_train$diseaseCat <- factor(sm_train$diseaseCat, levels = class_levels)
sm_valid$diseaseCat <- factor(sm_valid$diseaseCat, levels = class_levels)

cd_train <- S4Vectors::DataFrame(
  sample_id = sm_train$sample_id,
  diseaseCat = sm_train$diseaseCat,
  row.names = sm_train$sample_id
)

cd_valid <- S4Vectors::DataFrame(
  sample_id = sm_valid$sample_id,
  diseaseCat = sm_valid$diseaseCat,
  row.names = sm_valid$sample_id
)

MAE_train <- MultiAssayExperiment(
  experiments = ExperimentList(
    metabolome = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = met_train),
      colData = cd_train
    ),
    species = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = species_train),
      colData = cd_train
    )
  ),
  colData = cd_train
)

MAE_valid <- MultiAssayExperiment(
  experiments = ExperimentList(
    metabolome = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = met_valid),
      colData = cd_valid
    ),
    species = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = species_valid),
      colData = cd_valid
    )
  ),
  colData = cd_valid
)

fit <- IntegratedLearner::IntegratedLearner(
  MAE_train = MAE_train,
  MAE_valid = MAE_valid,
  experiment = c("metabolome", "species"),
  assay.type = c("abundance", "abundance"),
  outcome_col = "diseaseCat",
  subject_id_col = "sample_id",
  family = stats::binomial(),
  base_learner = "glmnet",
  meta_learner = "glmnet",
  run_stacked = TRUE,
  run_concat = TRUE,
  filter_method = "variance",
  filter_pct = 50,
  run_screening = TRUE,
  screen_pct = 25,
  folds = 2,
  verbose = TRUE
)
#> Feature filter (caret variance ranking, top 50.00% per layer): kept 461/922 features. Layer breakdown: metabolome=173/346, species=288/576.
#> Running multiclass base model for layer 1...
#> Warning: from glmnet C++ code (error code -81); Convergence for 81th lambda
#> value not reached after maxit=100000 iterations; solutions for larger lambdas
#> returned
#> Warning: from glmnet C++ code (error code -87); Convergence for 87th lambda
#> value not reached after maxit=100000 iterations; solutions for larger lambdas
#> returned
#> Warning: from glmnet C++ code (error code -76); Convergence for 76th lambda
#> value not reached after maxit=100000 iterations; solutions for larger lambdas
#> returned
#> Warning: from glmnet C++ code (error code -91); Convergence for 91th lambda
#> value not reached after maxit=100000 iterations; solutions for larger lambdas
#> returned
#> Warning: from glmnet C++ code (error code -92); Convergence for 92th lambda
#> value not reached after maxit=100000 iterations; solutions for larger lambdas
#> returned
#> Running multiclass base model for layer 2...
#> Running multiclass stacked model...
#> Running multiclass concatenated model...
#> Time for model fit : 0.146 minutes 
#> ========================================
#> Multiclass model fit with 3 classes
#> Base learner: glmnet 
#> Stacked learner: glmnet 
#> Concatenated learner: glmnet 
#> ========================================
#> Multiclass metrics for training data:
#>          model  accuracy balanced_accuracy       auc   logloss
#> 1   metabolome 0.5419355         0.5550314 0.7172645 0.9844078
#> 2      species 0.4903226         0.4360895 0.6770613 1.2123063
#> 3      stacked 0.5161290         0.4984277 0.6617548 1.0605069
#> 4 concatenated 0.5419355         0.5235849 0.7040199 0.9929088
#> ========================================
#> Multiclass metrics for test data:
#>          model  accuracy balanced_accuracy       auc   logloss
#> 1   metabolome 0.6461538         0.6455204 0.8078598 0.8402359
#> 2      species 0.3846154         0.3923584 0.6646290 1.0300602
#> 3      stacked 0.6615385         0.6546113 0.7820692 1.1737686
#> 4 concatenated 0.5846154         0.5816864 0.7581535 0.9188124
#> ========================================

Useful multiclass outputs:

fit$metrics.train
fit$metrics.test
fit$class.train
fit$class.test
fit$prob.train
fit$prob.test
fit$feature_importance_signed_by_class
fit$filter_method, fit$filter_pct
fit$screening_used, fit$screen_pct

The multiclass metric tables now report accuracy, balanced accuracy, one-vs-rest AUC, and log-loss. The plotting helper also returns a single one-vs-rest ROC figure with all class curves overlaid for each fitted model.

plot.obj.mc <- IntegratedLearner:::plot.learner(fit)
plot.obj.mc$plot

Example 4: Survival Outcome (Time-to-event)

For survival tasks, IntegratedLearner dispatches to ILsurv when survival metadata are detected. The expected fields are:

time: follow-up time (non-negative).
event: event indicator (0/1).
optional Y: Surv(time, event) convenience column.

This path uses the package-native survival backend (no mlr3 dependency required).

For plotting, the survival backend now stores:

a time-dependent AUC table evaluated over a denser event-time grid (rather than only a few summary quantiles), and
Kaplan-Meier curve payloads built from predicted risk groups for the best fused survival model.

This section provides a complete MAE workflow, followed by an equivalent PCL sketch.

load_il_dataset("gene_all", envir = environment())
load_il_dataset("mir_all", envir = environment())

to_feature_matrix <- function(df, id_col = "patient_id", n_keep = 120L) {
  drop_cols <- c("patient_id", "OS", "OS.time", "age", "race_white", "stage_i", "stage_ii")
  d <- as.data.frame(df, stringsAsFactors = FALSE)
  rownames(d) <- as.character(d[[id_col]])
  feature_cols <- setdiff(colnames(d), drop_cols)
  feature_cols <- feature_cols[seq_len(min(length(feature_cols), n_keep))]
  mat <- t(as.matrix(d[, feature_cols, drop = FALSE]))
  storage.mode(mat) <- "numeric"
  mat
}

gene_all <- gene_all[order(gene_all$patient_id), , drop = FALSE]
mir_all <- mir_all[order(mir_all$patient_id), , drop = FALSE]

common_ids <- intersect(as.character(gene_all$patient_id), as.character(mir_all$patient_id))
gene_all <- gene_all[match(common_ids, gene_all$patient_id), , drop = FALSE]
mir_all <- mir_all[match(common_ids, mir_all$patient_id), , drop = FALSE]

gene_mat <- to_feature_matrix(gene_all, n_keep = 120L)
mirna_mat <- to_feature_matrix(mir_all, n_keep = 100L)

tcga_metadata <- data.frame(
  patient_id = as.character(gene_all$patient_id),
  time = as.numeric(gene_all$OS.time),
  event = as.numeric(gene_all$OS),
  stringsAsFactors = FALSE
)
rownames(tcga_metadata) <- tcga_metadata$patient_id

common_ids <- Reduce(intersect, list(colnames(gene_mat), colnames(mirna_mat), rownames(tcga_metadata)))
gene_mat <- gene_mat[, common_ids, drop = FALSE]
mirna_mat <- mirna_mat[, common_ids, drop = FALSE]
tcga_metadata <- tcga_metadata[common_ids, , drop = FALSE]

tcga_metadata$outcome_surv <- I(survival::Surv(tcga_metadata$time, tcga_metadata$event))

set.seed(123)
event_ids <- rownames(tcga_metadata)[tcga_metadata$event == 1]
censor_ids <- rownames(tcga_metadata)[tcga_metadata$event == 0]
train_ids <- c(
  sample(event_ids, max(1L, floor(0.7 * length(event_ids)))),
  sample(censor_ids, max(1L, floor(0.7 * length(censor_ids))))
)
train_ids <- sort(unique(train_ids))
valid_ids <- setdiff(rownames(tcga_metadata), train_ids)

cd_train <- S4Vectors::DataFrame(tcga_metadata[train_ids, c("patient_id", "time", "event"), drop = FALSE])
cd_train$outcome_surv <- I(survival::Surv(cd_train$time, cd_train$event))
rownames(cd_train) <- cd_train$patient_id

cd_valid <- S4Vectors::DataFrame(tcga_metadata[valid_ids, c("patient_id", "time", "event"), drop = FALSE])
cd_valid$outcome_surv <- I(survival::Surv(cd_valid$time, cd_valid$event))
rownames(cd_valid) <- cd_valid$patient_id

mae_train <- MultiAssayExperiment(
  experiments = ExperimentList(
    gene = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = gene_mat[, train_ids, drop = FALSE]),
      colData = cd_train
    ),
    mirna = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = mirna_mat[, train_ids, drop = FALSE]),
      colData = cd_train
    )
  ),
  colData = cd_train
)

mae_valid <- MultiAssayExperiment(
  experiments = ExperimentList(
    gene = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = gene_mat[, valid_ids, drop = FALSE]),
      colData = cd_valid
    ),
    mirna = SummarizedExperiment::SummarizedExperiment(
      assays = list(abundance = mirna_mat[, valid_ids, drop = FALSE]),
      colData = cd_valid
    )
  ),
  colData = cd_valid
)

feature_metadata_surv <- data.frame(
  featureID = c(rownames(gene_mat), rownames(mirna_mat)),
  featureType = c(rep("gene", nrow(gene_mat)), rep("mirna", nrow(mirna_mat))),
  stringsAsFactors = FALSE
)
rownames(feature_metadata_surv) <- feature_metadata_surv$featureID

PCL_train <- list(
  feature_table = as.data.frame(rbind(
    gene_mat[, train_ids, drop = FALSE],
    mirna_mat[, train_ids, drop = FALSE]
  )),
  sample_metadata = as.data.frame(cd_train),
  feature_metadata = feature_metadata_surv
)

PCL_valid <- list(
  feature_table = as.data.frame(rbind(
    gene_mat[, valid_ids, drop = FALSE],
    mirna_mat[, valid_ids, drop = FALSE]
  )),
  sample_metadata = as.data.frame(cd_valid),
  feature_metadata = feature_metadata_surv
)

fit_surv_mae <- IntegratedLearner(
  MAE_train = mae_train,
  MAE_valid = mae_valid,
  experiment = c("gene", "mirna"),
  assay.type = c("abundance", "abundance"),
  outcome_col = "outcome_surv",
  subject_id_col = "patient_id",
  folds = 2,
  base_learner = "surv.coxph",
  filter_method = "variance",
  filter_pct = 40,
  run_screening = TRUE,
  screen_pct = 25,
  weight_method = "COX",           # alternative: "IBS"
  verbose = TRUE
)
#> Feature filter (caret variance ranking, top 40.00% per layer): kept 88/220 features. Layer breakdown: gene=48/120, mirna=40/100.
#> ILsurv starting
#>   base_learner: surv.coxph
#>   weight_method: COX
#>   folds: 2 | seed: 1234
#>   samples: 223 | features: 88
#>   layers: gene, mirna
#>   screening: cox (25.00%)
#> [gene] fitting OOF + full model (48 features)
#> [gene] done
#> [mirna] fitting OOF + full model (40 features)
#> [mirna] done
#> Computing single-layer training metrics
#>   [single:gene] cindex=0.5156
#>   [single:mirna] cindex=0.5117
#> Running early fusion
#> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> Ran out of iterations and did not converge
#> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> one or more coefficients may be infinite
#>   [early] cindex=0.5732
#> Preparing survival-matrix weighting inputs from layer risks
#> Learning late-fusion weights
#>   [late] weights: gene = 0.3299, mirna = 0.6701
#>   [late] cindex=0.5035
#> Running validation
#>   [valid single:gene] cindex=0.7192
#>   [valid single:mirna] cindex=0.6485
#>   [valid late] cindex=0.7111
#>   [valid early] cindex=0.6364
#> ILsurv completed

Interpret Survival Outputs

# Single-layer training metrics
# fit_surv_mae$train_out$single$metrics

# Late fusion (weighted integration)
fit_surv_mae$train_out$late$weights
#>      gene     mirna 
#> 0.3299115 0.6700885 
#> attr(,"method_details")
#> attr(,"method_details")$weight_method
#> [1] "COX"
#> 
#> attr(,"method_details")$time_grid
#>  [1]  159.6000  223.1310  304.0345  371.8276  393.4414  427.3517  461.2621
#>  [8]  517.7586  568.1655  611.4966  639.7586  690.3241  744.2414  807.5862
#> [15]  911.7724  996.2276 1061.0483 1140.5172 1208.3724 1327.5931 1478.2069
#> [22] 1602.0138 1683.6552 1850.4138 2039.5517 2195.8966 2395.3034 2723.6207
#> [29] 3067.4897 3250.6000
#> 
#> attr(,"method_details")$t_vec
#> [1]  223.1310  517.7586  911.7724 1683.6552 3067.4897
#> 
#> attr(,"method_details")$layer_score
#> [1] "sum"
#> 
#> attr(,"method_details")$scaling
#> attr(,"method_details")$scaling$M
#>                gene       mirna
#>   [1,] -0.061903150 -0.67711347
#>   [2,]  1.817927060  1.79215698
#>   [3,]  1.464365758  0.67435980
#>   [4,]  2.338852543  1.49106782
#>   [5,] -1.011058140 -1.71667484
#>   [6,] -0.016276534  0.27021412
#>   [7,] -0.691130856 -1.28699794
#>   [8,] -0.943135955 -0.94306926
#>   [9,]  1.471506946  0.88715162
#>  [10,] -1.559323792 -0.14385050
#>  [11,] -1.293672327  0.19815950
#>  [12,]  0.037993082  0.46238809
#>  [13,] -0.679767931  0.31106705
#>  [14,]  0.390147247 -0.59536489
#>  [15,]  1.206573893  0.67266422
#>  [16,] -1.122201041 -0.44489890
#>  [17,]  0.803589179  0.52890995
#>  [18,] -0.505092040  0.38006779
#>  [19,] -2.042254388 -0.87888937
#>  [20,]  2.405672330  0.77552750
#>  [21,]  0.336934126 -0.16715668
#>  [22,]  0.852256459 -0.98856259
#>  [23,]  0.289610002  1.26128961
#>  [24,] -0.368215682 -1.46755197
#>  [25,] -1.130301640 -0.29720910
#>  [26,] -0.936557144 -0.78923178
#>  [27,]  0.807078148 -1.13709919
#>  [28,]  0.459051734  1.52164521
#>  [29,]  1.062583684  1.52309794
#>  [30,] -0.670687028 -1.01528491
#>  [31,]  0.307169691 -0.59980737
#>  [32,] -0.768101386  1.37451409
#>  [33,]  1.236535230 -0.64507880
#>  [34,] -1.803860401  1.41048658
#>  [35,]  0.356839544  2.54849159
#>  [36,] -1.216894944 -0.08565249
#>  [37,]  1.181545655  0.19375221
#>  [38,]  0.564906227  1.47882103
#>  [39,]  2.049510782  0.37892986
#>  [40,]  2.405672330 -0.07710970
#>  [41,]  0.064536805 -0.11869875
#>  [42,] -0.133905826  0.95415594
#>  [43,]  0.374376666  0.98291367
#>  [44,] -0.081183693  0.20380360
#>  [45,] -0.799146157 -1.09423647
#>  [46,]  1.362369594 -0.30437221
#>  [47,] -0.004577517 -0.80635502
#>  [48,]  0.345180452 -2.16146595
#>  [49,]  0.548572912 -2.16146595
#>  [50,] -0.602444456 -0.96876616
#>  [51,] -0.945911621 -0.66486528
#>  [52,] -0.431949381 -0.08665403
#>  [53,]  0.370009461  0.20497040
#>  [54,]  0.928866723  1.87452930
#>  [55,] -2.474691655  0.39823693
#>  [56,] -0.337915981  0.54782023
#>  [57,] -0.281114257  0.78976951
#>  [58,]  0.873994947  0.08581114
#>  [59,] -2.029896290 -1.35831082
#>  [60,]  0.231537912 -0.07568244
#>  [61,] -0.293851906 -0.20057596
#>  [62,]  0.174912610 -0.15023112
#>  [63,]  0.563211075 -1.29274298
#>  [64,]  0.034777086 -0.95337957
#>  [65,]  0.938416571  1.02361940
#>  [66,]  0.848338622 -0.85962391
#>  [67,] -0.184236987 -2.11734085
#>  [68,] -0.513065219  2.54849159
#>  [69,] -1.184301742  0.20096053
#>  [70,]  0.663637717  0.88219268
#>  [71,]  0.323202930 -0.05497647
#>  [72,]  2.276160823 -0.18763794
#>  [73,] -0.892415713 -0.74367617
#>  [74,] -1.075498934 -2.05990240
#>  [75,] -0.263286461 -0.17435304
#>  [76,] -0.808222883 -0.43168089
#>  [77,]  0.820187458 -0.14057704
#>  [78,]  0.954586545 -0.90145411
#>  [79,]  1.003171913  0.79955597
#>  [80,]  0.324542186 -0.81918417
#>  [81,]  0.977915172  1.25519391
#>  [82,] -0.265014466 -0.95715191
#>  [83,]  0.846648241  1.63173306
#>  [84,] -0.730298412 -0.81060145
#>  [85,] -1.527766875 -0.22708177
#>  [86,] -0.472527602  0.55349830
#>  [87,]  1.674783531  0.73624556
#>  [88,]  0.701270854 -0.21437640
#>  [89,]  0.269357660  0.33220860
#>  [90,]  1.546084766  0.89509632
#>  [91,] -0.594946120 -0.26407967
#>  [92,]  1.310221432  1.67759400
#>  [93,]  0.253875613  0.42822800
#>  [94,]  0.022646629  0.61510849
#>  [95,] -0.678533093 -0.37030261
#>  [96,]  0.807170349  0.14388573
#>  [97,] -0.516756370  0.67940636
#>  [98,] -0.371691186  0.80715970
#>  [99,] -0.524860961  0.09988816
#> [100,] -0.754696923  0.01435240
#> [101,]  0.324248459  0.68681350
#> [102,] -1.016896242  0.50997886
#> [103,]  0.415352500  2.28286289
#> [104,]  0.301530693  1.11469031
#> [105,]  0.266898650  0.04636093
#> [106,]  0.150969487  0.88951664
#> [107,]  2.145998493  0.58185989
#> [108,]  0.877029686  0.57477762
#> [109,] -0.232177145  0.04568081
#> [110,] -0.373847775  0.84136746
#> [111,]  0.131689695 -0.99754042
#> [112,] -1.194284211  1.65359371
#> [113,]  0.917354834 -1.88276440
#> [114,]  1.032725050  0.56286491
#> [115,]  0.249291589 -0.12578429
#> [116,]  0.138355208  1.01111328
#> [117,] -0.034675046 -0.51085788
#> [118,] -0.382250726 -0.60708719
#> [119,] -0.222545820 -1.68159650
#> [120,]  0.964408175 -2.06828509
#> [121,]  0.403289685  0.89654141
#> [122,] -1.242203498  0.28314562
#> [123,]  0.331880052 -0.30214983
#> [124,]  1.160675024  1.82701883
#> [125,] -0.915937274 -0.33811665
#> [126,]  0.358454911  1.02902673
#> [127,] -0.229573184 -0.92913211
#> [128,] -0.936593155 -0.58786870
#> [129,] -0.814534320 -0.24968369
#> [130,]  1.332190612  1.63202877
#> [131,]  1.468705004  0.93411398
#> [132,] -0.281497537  1.20533867
#> [133,]  0.331495162  0.84257132
#> [134,] -0.085960733  0.22732169
#> [135,] -0.698231221  0.12401615
#> [136,] -0.425799598  0.01713611
#> [137,]  0.042100525  0.28039650
#> [138,] -0.192253413  1.82512684
#> [139,] -0.068975483  0.11599669
#> [140,] -1.173024809  0.68523580
#> [141,] -0.126634824 -1.69209943
#> [142,] -0.121908885 -0.76560902
#> [143,] -2.474691655 -1.15366252
#> [144,] -0.688958515 -0.47707902
#> [145,]  1.088529157 -0.87793418
#> [146,] -1.375036896 -1.10700405
#> [147,]  0.939826674 -0.43823283
#> [148,]  0.183592231 -0.93859815
#> [149,] -0.362102403 -0.13780573
#> [150,]  0.043010980  0.15373902
#> [151,]  0.475825224 -0.64128763
#> [152,]  0.976499144 -0.01970685
#> [153,] -0.269240778 -0.52072532
#> [154,] -1.721295167 -1.47027033
#> [155,]  0.570986803 -0.66370084
#> [156,]  0.310275491 -0.12851467
#> [157,] -0.857297669 -0.85906151
#> [158,] -2.323654031 -0.08704727
#> [159,]  0.160463152  0.69844157
#> [160,]  0.055667112 -1.31827010
#> [161,]  0.451231401 -0.54617382
#> [162,] -0.402858420 -2.16146595
#> [163,]  0.160682667 -0.04512278
#> [164,] -0.383500807  0.19992557
#> [165,] -1.497193523 -1.13840690
#> [166,] -0.136634293 -0.55604649
#> [167,] -0.193290398 -0.37175472
#> [168,]  0.971101468 -0.38310938
#> [169,] -1.286736111 -0.27638305
#> [170,]  1.599709594  0.34963378
#> [171,]  1.100731755 -0.50421847
#> [172,] -1.665960897  1.79617512
#> [173,]  0.138050189 -0.03312519
#> [174,] -0.564753840 -1.14882311
#> [175,] -2.150627329 -1.51593406
#> [176,]  2.405672330  2.26399259
#> [177,]  0.869763770  1.39532453
#> [178,]  1.230642001  0.01392923
#> [179,]  0.575679511 -1.94794684
#> [180,] -0.564117717  0.36134638
#> [181,] -1.220203387 -0.92765380
#> [182,] -1.549232857 -2.09952793
#> [183,]  0.250678014 -0.22986985
#> [184,] -1.703456834 -0.83478519
#> [185,] -0.297965506 -0.75641902
#> [186,]  0.139254808 -0.62330089
#> [187,]  1.559627650  1.28740832
#> [188,] -0.306178731  0.76211826
#> [189,]  1.867177661  2.54849159
#> [190,]  0.907232278 -1.06899834
#> [191,] -0.310332882  0.21397553
#> [192,]  1.590051786  1.23533143
#> [193,] -0.529183449  0.87666813
#> [194,]  1.065103803 -0.10753727
#> [195,]  0.244020172 -0.76279460
#> [196,]  1.120404993 -0.13003287
#> [197,] -0.492263383  0.26134412
#> [198,]  0.372637409  1.92479375
#> [199,] -0.449046558  1.13735695
#> [200,] -2.314147992  0.94142872
#> [201,] -1.660311183  0.20870505
#> [202,]  1.040995328  0.37977165
#> [203,] -0.413298356 -1.89256168
#> [204,] -0.990759358 -0.19299692
#> [205,]  1.063025389 -0.53263472
#> [206,] -0.304698790  1.76543175
#> [207,]  0.218184752  1.17791866
#> [208,]  1.208298415 -0.01402709
#> [209,] -0.566983608 -0.02631378
#> [210,] -1.304088144  0.30061036
#> [211,] -1.448870191 -0.43391776
#> [212,]  0.174450002 -0.60797081
#> [213,]  0.808243592 -0.52741979
#> [214,] -0.501833061 -0.59420657
#> [215,] -0.903764025 -1.19668436
#> [216,]  0.349568925  0.29958237
#> [217,] -0.185602397  0.60077560
#> [218,] -0.262920711  1.23194978
#> [219,] -2.474691655 -0.93042972
#> [220,] -0.418134212 -0.63702668
#> [221,] -0.370441734 -0.29511164
#> [222,] -0.939095836 -1.03206859
#> [223,]  0.287977220 -0.26517367
#> 
#> attr(,"method_details")$scaling$center
#>      gene     mirna 
#> 0.4563780 0.4574176 
#> 
#> attr(,"method_details")$scaling$scale
#>       gene      mirna 
#> 0.02964485 0.09377227 
#> 
#> 
#> attr(,"method_details")$weight_lambda
#> [1] 0.02
#> 
#> attr(,"method_details")$weight_penalty
#> [1] "l2_to_uniform"
#> 
#> attr(,"method_details")$weight_cap
#> [1] 1
fit_surv_mae$train_out$late$train_cindex
#> [1] 0.5035102
fit_surv_mae$train_out$late$train_auc
#>            time       AUC
#> t=141.2   141.2 0.3393038
#> t=169.2   169.2 0.4308691
#> t=197     197.0 0.3763094
#> t=238.2   238.2 0.4610534
#> t=311.4   311.4 0.4794404
#> t=353.4   353.4 0.5408010
#> t=534.6   534.6 0.4930835
#> t=612     612.0 0.5904624
#> t=653.4   653.4 0.5029730
#> t=848.2   848.2 0.4640157
#> t=1052.8 1052.8 0.4487822
#> t=1289.6 1289.6 0.4185638
#> t=1430   1430.0 0.4504876
#> t=1452.8 1452.8 0.4859213
#> t=1527.2 1527.2 0.4499681
#> t=1607.6 1607.6 0.4354954
#> t=1678.8 1678.8 0.4648625
#> t=1699   1699.0 0.4602310
#> t=1805.8 1805.8 0.4887562
#> t=2034.6 2034.6 0.5412678
#> t=2115   2115.0 0.5150369
#> t=2179   2179.0 0.5580437
#> t=2207   2207.0 0.5899874
#> t=2628.6 2628.6 0.6403522
#> t=3222.6 3222.6 0.7631859

# Early fusion summary
# fit_surv_mae$train_out$early
# Validation metrics
fit_surv_mae$valid_out$late$valid_cindex
#> [1] 0.7111111
fit_surv_mae$valid_out$late$valid_auc
#>        time       AUC
#> t=548   548 0.9420290
#> t=754   754 0.7309908
#> t=976   976 0.6777911
#> t=1174 1174 0.7719847
#> t=1411 1411 0.7973105
#> t=1556 1556 0.6588996
#> t=1642 1642 0.7026588
#> t=1673 1673 0.6051685
#> t=2009 2009 0.6331028
#> t=2207 2207 0.7407960
#> t=2636 2636 0.7909917
#> t=2763 2763 0.7581064
#> t=3472 3472 0.7847797
fit_surv_mae$valid_out$single$valid_cindex
#> $gene
#> [1] 0.7191919
#> 
#> $mirna
#> [1] 0.6484848

The train_auc and valid_auc objects are data frames with time and AUC columns, so they can be plotted as true time-dependent discrimination curves.

plot.obj.surv <- IntegratedLearner:::plot.learner(fit_surv_mae)
plot.obj.surv$plot

Session Information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
#>  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
#>  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
#>  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
#>  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
#> [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    splines   stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] bartMachine_1.4.2           survival_3.8-6             
#>  [3] MultiAssayExperiment_1.39.0 SummarizedExperiment_1.43.0
#>  [5] Biobase_2.73.1              GenomicRanges_1.65.0       
#>  [7] Seqinfo_1.3.0               IRanges_2.47.2             
#>  [9] MatrixGenerics_1.25.0       matrixStats_1.5.0          
#> [11] S4Vectors_0.51.3            BiocGenerics_0.59.7        
#> [13] generics_0.1.4              bayesplot_1.15.0           
#> [15] cowplot_1.2.0               caret_7.0-1                
#> [17] lattice_0.22-9              SuperLearner_2.0-40        
#> [19] gam_1.22-7                  foreach_1.5.2              
#> [21] nnls_1.6                    ggplot2_4.0.3              
#> [23] dplyr_1.2.1                 IntegratedLearner_0.99.0   
#> [25] rmarkdown_2.31             
#> 
#> loaded via a namespace (and not attached):
#>   [1] Rdpack_2.6.6          pROC_1.19.0.1         rlang_1.2.0          
#>   [4] magrittr_2.0.5        otel_0.2.0            compiler_4.6.0       
#>   [7] vctrs_0.7.3           reshape2_1.4.5        quadprog_1.5-8       
#>  [10] stringr_1.6.0         shape_1.4.6.1         pkgconfig_2.0.3      
#>  [13] fastmap_1.2.0         XVector_0.53.0        backports_1.5.1      
#>  [16] labeling_0.4.3        prodlim_2026.03.11    nloptr_2.2.1         
#>  [19] itertools_0.1-3       purrr_1.2.2           glmnet_5.0           
#>  [22] xfun_0.58             randomForest_4.7-1.2  cachem_1.1.0         
#>  [25] jsonlite_2.0.0        recipes_1.3.3         DelayedArray_0.39.3  
#>  [28] timereg_2.0.7         parallel_4.6.0        R6_2.6.1             
#>  [31] bslib_0.11.0          stringi_1.8.7         RColorBrewer_1.1-3   
#>  [34] ranger_0.18.0         parallelly_1.47.0     rpart_4.1.27         
#>  [37] numDeriv_2016.8-1.1   lubridate_1.9.5       jquerylib_0.1.4      
#>  [40] Rcpp_1.1.1-1.1        iterators_1.0.14      knitr_1.51           
#>  [43] future.apply_1.20.2   BiocBaseUtils_1.15.1  Matrix_1.7-5         
#>  [46] nnet_7.3-20           timechange_0.4.0      tidyselect_1.2.1     
#>  [49] abind_1.4-8           yaml_2.3.12           timeDate_4052.112    
#>  [52] codetools_0.2-20      listenv_0.10.1        doRNG_1.8.6.3        
#>  [55] tibble_3.3.1          plyr_1.8.9            withr_3.0.2          
#>  [58] S7_0.2.2              posterior_1.7.0       ROCR_1.0-12          
#>  [61] evaluate_1.0.5        future_1.70.0         rJava_1.0-18         
#>  [64] pillar_1.11.1         tensorA_0.36.2.1      rngtools_1.5.2       
#>  [67] checkmate_2.3.4       distributional_0.7.0  scales_1.4.0         
#>  [70] globals_0.19.1        class_7.3-23          glue_1.8.1           
#>  [73] maketools_1.3.2       tools_4.6.0           sys_3.4.3            
#>  [76] data.table_1.18.4     ModelMetrics_1.2.2.2  gower_1.0.2          
#>  [79] mvtnorm_1.4-1         buildtools_1.0.0      grid_4.6.0           
#>  [82] pec_2025.06.24        tidyr_1.3.2           missForest_1.6.1     
#>  [85] rbibutils_2.4.1       ipred_0.9-15          nlme_3.1-169         
#>  [88] bartMachineJARs_1.2.2 cli_3.6.6             S4Arrays_1.13.0      
#>  [91] lava_1.9.1            gtable_0.3.6          sass_0.4.10          
#>  [94] digest_0.6.39         SparseArray_1.13.2    farver_2.1.2         
#>  [97] htmltools_0.5.9       lifecycle_1.0.5       hardhat_1.4.3        
#> [100] timeROC_0.4.1         MASS_7.3-65

References

Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology 4(2):293-305.

Ghaemi MS et al. (2019). Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics 35(1):95-103.

Citation

Mallick et al. (2024). An integrated Bayesian framework for multi-omics prediction and classification. Statistics in Medicine 43(5):983-1002.