| Title: | Integrated Multi-Omics Learning for Survival and Other Outcomes |
|---|---|
| Description: | Provides a unified interface for multi-omics prediction using early, late, and intermediate fusion for continuous, binary, multiclass, and survival outcomes. It supports both MultiAssayExperiment and PCL-style inputs, performs input validation and feature/sample harmonization across layers, and provides model fitting, prediction, plotting, and variable-importance utilities. |
| Authors: | Nalin Arora [aut, cre, cph] (ORCID: <https://orcid.org/0009-0009-1340-688X>), Anupreet Porwal [aut], Himel Mallick [aut] |
| Maintainer: | Nalin Arora <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.0 |
| Built: | 2026-06-09 13:48:32 UTC |
| Source: | https://github.com/BiocStaging/IntegratedLearner |
Title Meta level Objective function: NNLS for gaussian; Rank loss for binary observations
auc.obj(b, X, Y)auc.obj(b, X, Y)
b |
Weights vector |
X |
Design matrix (data frame) |
Y |
Outcome variable |
1 - AUC
X <- data.frame(m1 = c(0.1, 0.4, 0.8, 0.9), m2 = c(0.2, 0.3, 0.7, 0.95)) Y <- c(0, 0, 1, 1) auc.obj(b = c(0.5, 0.5), X = X, Y = Y)X <- data.frame(m1 = c(0.1, 0.4, 0.8, 0.9), m2 = c(0.2, 0.3, 0.7, 0.95)) Y <- c(0, 0, 1, 1) auc.obj(b = c(0.5, 0.5), X = X, Y = Y)
Metabolomics feature table for the FranzosaE_2019 training cohort.
data(FranzosaE_2019_CuratedMetabolome)data(FranzosaE_2019_CuratedMetabolome)
A data frame with samples in rows and metabolite features in columns.
Sample-level metadata for the FranzosaE_2019 training cohort.
data(FranzosaE_2019_CuratedMetadata)data(FranzosaE_2019_CuratedMetadata)
A data frame with one row per sample and clinical/study covariates in columns.
Species-level relative abundance table for the FranzosaE_2019 training cohort.
data(FranzosaE_2019_CuratedSpeciesProfile)data(FranzosaE_2019_CuratedSpeciesProfile)
A data frame with samples in rows and microbial species features in columns.
Metabolomics feature table for the FranzosaE_2019 external validation cohort.
data(FranzosaE_2019_Validation_CuratedMetabolome)data(FranzosaE_2019_Validation_CuratedMetabolome)
A data frame with samples in rows and metabolite features in columns.
Sample-level metadata for the FranzosaE_2019 external validation cohort.
data(FranzosaE_2019_Validation_CuratedMetadata)data(FranzosaE_2019_Validation_CuratedMetadata)
A data frame with one row per sample and clinical/study covariates in columns.
Species-level relative abundance table for the FranzosaE_2019 external validation cohort.
data(FranzosaE_2019_Validation_CuratedSpeciesProfile)data(FranzosaE_2019_Validation_CuratedSpeciesProfile)
A data frame with samples in rows and microbial species features in columns.
Gene-expression and associated covariate/outcome table for TCGA BRCA examples.
data(gene_all)data(gene_all)
A data frame with one row per patient and covariates plus gene features in columns.
TCGA-derived example data bundled for package tests/tutorials.
Performs integrated machine learning to predict a binary or continuous outcome based on two or more omics layers (views). This function implements the core IntegratedLearner engine for non-survival outcomes.
IL_conbin( feature_table, sample_metadata, feature_metadata, feature_table_valid = NULL, sample_metadata_valid = NULL, folds = 5, seed = 1234, base_learner = "SL.BART", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "SL.nnls.auc", run_concat = TRUE, run_stacked = TRUE, drop_poor_performing_layers = FALSE, verbose = FALSE, print_learner = TRUE, refit.stack = FALSE, family = stats::gaussian(), ... )IL_conbin( feature_table, sample_metadata, feature_metadata, feature_table_valid = NULL, sample_metadata_valid = NULL, folds = 5, seed = 1234, base_learner = "SL.BART", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "SL.nnls.auc", run_concat = TRUE, run_stacked = TRUE, drop_poor_performing_layers = FALSE, verbose = FALSE, print_learner = TRUE, refit.stack = FALSE, family = stats::gaussian(), ... )
feature_table |
An R data frame containing multi-layer features (in rows)
and samples (in columns).
Column names of |
sample_metadata |
An R data frame of metadata variables (in columns).
Must have a column named |
feature_metadata |
An R data frame of feature-specific metadata across
views (in columns) and features (in rows). Must have a column named
|
feature_table_valid |
Feature table from validation set for which
prediction is desired. Must have the exact same structure as
|
sample_metadata_valid |
Sample-specific metadata table from independent
validation set when available. Must have the exact same structure as
|
folds |
How many folds in the V-fold nested cross-validation? Default is 5. |
seed |
Specify the seed for reproducibility. Default is 1234. |
base_learner |
Base learner for late fusion and early fusion. Default:
|
base_screener |
Deprecated for this backend; currently ignored and kept only for backward compatibility. |
run_screening |
Logical; if |
screen_pct |
Percentage of features to retain during screening
( |
filter_method |
Optional feature-filter method before model fitting.
Supported values are |
filter_pct |
Optional retention percentage (in |
prevalence_pct |
Optional retention percentage (in |
meta_learner |
Meta-learner for late fusion (stacked generalization).
Defaults to |
run_concat |
Should early fusion be run? Default is TRUE. |
run_stacked |
Should stacked model (late fusion) be run? Default is TRUE. |
drop_poor_performing_layers |
Logical; if |
verbose |
logical; TRUE for printing SuperLearner progress. Default FALSE. |
print_learner |
logical; Should a detailed summary be printed? Default TRUE. |
refit.stack |
logical; For late fusion, refit predictions on the entire data are returned if specified. Default FALSE. |
family |
Allows |
... |
Additional arguments (currently unused). |
IL_conbin takes a training set
(feature_table, sample_metadata, feature_metadata) and, optionally,
a corresponding validation set, and returns predicted values based on the
validation set. It also performs V-fold nested cross-validation to estimate
the prediction accuracy of various fusion algorithms.
Two integration paradigms are supported: early and late. The software includes multiple ML models based on the
SuperLearner R package as well as several data
exploration capabilities and visualization modules in a unified estimation
framework.
Although IL_conbin() is typically called internally by
IntegratedLearner() after extracting multi-view feature tables from
MultiAssayExperiment objects, advanced users may call
IL_conbin() directly when they already have multi-layer feature tables
and metadata in matrix/data.frame form.
A list-like IntegratedLearner object containing fitted layer-specific, stacked, and concatenated models, cross-validated performance (AUC or R\302\262), and predictions for training and validation sets.
Himel Mallick, [email protected]
IntegratedLearner, IL_survival()
is.function(IL_conbin) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = rnorm(n), row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- IL_conbin( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3, base_learner = "SL.mean", run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE, family = stats::gaussian() ) names(fit) }is.function(IL_conbin) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = rnorm(n), row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- IL_conbin( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3, base_learner = "SL.mean", run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE, family = stats::gaussian() ) names(fit) }
Native multiclass backend used by IntegratedLearner() when
family = binomial() and the outcome has more than two classes.
This backend preserves the existing API while using multiclass-native
probability modeling and multiclass stacking.
IL_multiclass( feature_table, sample_metadata, feature_metadata, feature_table_valid = NULL, sample_metadata_valid = NULL, folds = 5, seed = 1234, base_learner = "glmnet", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "glmnet", run_concat = TRUE, run_stacked = TRUE, verbose = FALSE, print_learner = TRUE, family = stats::binomial(), eps = 1e-15, ... )IL_multiclass( feature_table, sample_metadata, feature_metadata, feature_table_valid = NULL, sample_metadata_valid = NULL, folds = 5, seed = 1234, base_learner = "glmnet", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "glmnet", run_concat = TRUE, run_stacked = TRUE, verbose = FALSE, print_learner = TRUE, family = stats::binomial(), eps = 1e-15, ... )
feature_table |
An R data frame containing multi-layer features (in rows)
and samples (in columns).
Column names of |
sample_metadata |
An R data frame of metadata variables (in columns).
Must have a column named |
feature_metadata |
An R data frame of feature-specific metadata across
views (in columns) and features (in rows). Must have a column named
|
feature_table_valid |
Feature table from validation set for which
prediction is desired. Must have the exact same structure as
|
sample_metadata_valid |
Sample-specific metadata table from independent
validation set when available. Must have the exact same structure as
|
folds |
How many folds in the V-fold nested cross-validation? Default is 5. |
seed |
Specify the seed for reproducibility. Default is 1234. |
base_learner |
Base learner for late fusion and early fusion. Default:
|
base_screener |
Deprecated for this backend; currently ignored and kept only for backward compatibility. |
run_screening |
Logical; if |
screen_pct |
Percentage of features to retain during screening
( |
filter_method |
Optional feature-filter method before model fitting.
Supported values are |
filter_pct |
Optional retention percentage (in |
prevalence_pct |
Optional retention percentage (in |
meta_learner |
Meta-learner for late fusion (stacked generalization).
Defaults to |
run_concat |
Should early fusion be run? Default is TRUE. |
run_stacked |
Should stacked model (late fusion) be run? Default is TRUE. |
verbose |
logical; TRUE for printing SuperLearner progress. Default FALSE. |
print_learner |
logical; Should a detailed summary be printed? Default TRUE. |
family |
Allows |
eps |
Small positive constant used to stabilize probabilities. |
... |
Additional arguments (currently unused). |
A fitted multiclass IntegratedLearner object.
is.function(IL_multiclass) if (FALSE) { set.seed(1) n <- 24 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) y <- rep(c("A", "B", "C"), length.out = n) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = y, row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- IL_multiclass( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3, run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE ) fit$family }is.function(IL_multiclass) if (FALSE) { set.seed(1) n <- 24 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) y <- rep(c("A", "B", "C"), length.out = n) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = y, row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- IL_multiclass( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3, run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE ) fit$family }
BioC-friendly survival backend used by IntegratedLearner() for
time-to-event outcomes. This preserves the historical ILsurv()
interface while using model fitting and fusion routines that do not depend on
mlr3proba/mlr3extralearners.
ILsurv( feature_table, sample_metadata, feature_metadata, valid_feature_table = NULL, valid_sample_metadata = NULL, base_learner = "surv.coxph", folds = 5, seed = 123, run_screening = FALSE, screen_pct = NULL, drop_poor_performing_layers = FALSE, verbose = FALSE, do_early_fusion = TRUE, weight_method = c("IBS", "COX"), t_vec = NULL, t_vec_probs = c(0.05, 0.25, 0.5, 0.75, 0.95), layer_score = c("sum", "mean", "l2"), eps = 1e-12, weight_lambda = 0.02, weight_penalty = c("l2_to_uniform", "entropy"), weight_cap = 1, optim_maxit_cox = 4000, optim_maxit_ibs = 300, ibs_shrink_to_uniform = 0, ... ) IL_survival( feature_table, sample_metadata, feature_metadata, valid_feature_table = NULL, valid_sample_metadata = NULL, base_learner = "surv.coxph", folds = 5, seed = 123, run_screening = FALSE, screen_pct = NULL, drop_poor_performing_layers = FALSE, verbose = FALSE, do_early_fusion = TRUE, weight_method = c("IBS", "COX"), t_vec = NULL, t_vec_probs = c(0.05, 0.25, 0.5, 0.75, 0.95), layer_score = c("sum", "mean", "l2"), eps = 1e-12, weight_lambda = 0.02, weight_penalty = c("l2_to_uniform", "entropy"), weight_cap = 1, optim_maxit_cox = 4000, optim_maxit_ibs = 300, ibs_shrink_to_uniform = 0, ... )ILsurv( feature_table, sample_metadata, feature_metadata, valid_feature_table = NULL, valid_sample_metadata = NULL, base_learner = "surv.coxph", folds = 5, seed = 123, run_screening = FALSE, screen_pct = NULL, drop_poor_performing_layers = FALSE, verbose = FALSE, do_early_fusion = TRUE, weight_method = c("IBS", "COX"), t_vec = NULL, t_vec_probs = c(0.05, 0.25, 0.5, 0.75, 0.95), layer_score = c("sum", "mean", "l2"), eps = 1e-12, weight_lambda = 0.02, weight_penalty = c("l2_to_uniform", "entropy"), weight_cap = 1, optim_maxit_cox = 4000, optim_maxit_ibs = 300, ibs_shrink_to_uniform = 0, ... ) IL_survival( feature_table, sample_metadata, feature_metadata, valid_feature_table = NULL, valid_sample_metadata = NULL, base_learner = "surv.coxph", folds = 5, seed = 123, run_screening = FALSE, screen_pct = NULL, drop_poor_performing_layers = FALSE, verbose = FALSE, do_early_fusion = TRUE, weight_method = c("IBS", "COX"), t_vec = NULL, t_vec_probs = c(0.05, 0.25, 0.5, 0.75, 0.95), layer_score = c("sum", "mean", "l2"), eps = 1e-12, weight_lambda = 0.02, weight_penalty = c("l2_to_uniform", "entropy"), weight_cap = 1, optim_maxit_cox = 4000, optim_maxit_ibs = 300, ibs_shrink_to_uniform = 0, ... )
feature_table |
An R data frame containing multi-layer features (in rows)
and samples (in columns).
Column names of |
sample_metadata |
An R data frame of metadata variables (in columns).
Must have a column named |
feature_metadata |
An R data frame of feature-specific metadata across
views (in columns) and features (in rows). Must have a column named
|
valid_feature_table |
Validation feature table (features x samples). |
valid_sample_metadata |
Validation sample metadata containing |
base_learner |
Survival base learner. |
folds |
How many folds in the V-fold nested cross-validation? Default is 5. |
seed |
Specify the seed for reproducibility. Default is 1234. |
run_screening |
Logical; if |
screen_pct |
Percentage of features to retain during screening
( |
drop_poor_performing_layers |
Logical; if |
verbose |
logical; TRUE for printing SuperLearner progress. Default FALSE. |
do_early_fusion |
Logical; run early fusion model. |
weight_method |
Late-fusion weighting method. Supported:
|
t_vec |
Optional explicit time points for COX-based late-fusion feature construction. |
t_vec_probs |
Quantiles used to derive |
layer_score |
Layer summary on cumhaz increments: |
eps |
Numerical lower bound for survival probabilities. |
weight_lambda |
COX-weight optimization regularization strength. |
weight_penalty |
COX-weight optimization regularizer:
|
weight_cap |
Optional cap on any one layer weight (COX method). |
optim_maxit_cox |
Maximum iterations for COX-weight optimization. |
optim_maxit_ibs |
Maximum iterations for IBS-weight optimization. |
ibs_shrink_to_uniform |
Optional convex shrinkage of IBS weights toward uniform. |
... |
Additional base-learner hyperparameters passed to
|
List with train_out and valid_out in the same nested
format as previous survival implementations.
identical(IL_survival, ILsurv) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), time = rexp(n, rate = 0.1), event = rbinom(n, 1, 0.6), row.names = colnames(feature_table) ) sample_metadata$Y <- survival::Surv(sample_metadata$time, sample_metadata$event) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- ILsurv( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3 ) names(fit) }identical(IL_survival, ILsurv) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), time = rexp(n, rate = 0.1), event = rbinom(n, 1, 0.6), row.names = colnames(feature_table) ) sample_metadata$Y <- survival::Surv(sample_metadata$time, sample_metadata$event) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) fit <- ILsurv( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata, folds = 3 ) names(fit) }
Performs integrated machine learning to predict a binary, continuous, or
time-to-event outcome based on two or more omics layers (views).
The IntegratedLearner function takes a training
MultiAssayExperiment and, optionally, a validation
MultiAssayExperiment, extracts multi-layer feature tables, and returns
predicted values based on the validation set.
It also performs V-fold nested cross-validation to estimate the prediction
accuracy of various fusion algorithms.
Two integration paradigms are supported: early and late.
The software includes multiple ML models based on the
SuperLearner R package as well as several data
exploration capabilities and visualization modules in a unified estimation
framework.
IntegratedLearner( MAE_train = NULL, MAE_valid = NULL, PCL_train = NULL, PCL_valid = NULL, experiment = NULL, assay.type = NULL, outcome_col = "Y", subject_id_col = "subjectID", na.rm = FALSE, folds = 5, seed = 1234, base_learner = "SL.BART", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "SL.nnls.auc", run_concat = TRUE, run_stacked = TRUE, drop_poor_performing_layers = FALSE, verbose = FALSE, print_learner = TRUE, refit.stack = FALSE, family = stats::gaussian(), ... )IntegratedLearner( MAE_train = NULL, MAE_valid = NULL, PCL_train = NULL, PCL_valid = NULL, experiment = NULL, assay.type = NULL, outcome_col = "Y", subject_id_col = "subjectID", na.rm = FALSE, folds = 5, seed = 1234, base_learner = "SL.BART", base_screener = "All", run_screening = FALSE, screen_pct = NULL, filter_method = NULL, filter_pct = NULL, prevalence_pct = NULL, meta_learner = "SL.nnls.auc", run_concat = TRUE, run_stacked = TRUE, drop_poor_performing_layers = FALSE, verbose = FALSE, print_learner = TRUE, refit.stack = FALSE, family = stats::gaussian(), ... )
MAE_train |
A |
MAE_valid |
Optional |
PCL_train |
Optional list of per-layer feature matrices (legacy PCL mode for backward compatibility). |
PCL_valid |
Optional validation list of per-layer feature matrices. |
experiment |
Optional character or integer vector specifying which
experiments (layers) to extract from |
assay.type |
Optional character vector of assay names, one per
experiment. If |
outcome_col |
Outcome column name in MAE/PCL sample metadata. Defaults
to |
subject_id_col |
Subject identifier column name in MAE/PCL sample
metadata. Defaults to |
na.rm |
Logical; if |
folds |
How many folds in the V-fold nested cross-validation? Default is 10. |
seed |
Specify the arbitrary seed value for reproducibility. Default is 1234. |
base_learner |
Base learner for late fusion and early fusion.
Check out the
SuperLearner package page
for all available options. Default is |
base_screener |
Deprecated for |
run_screening |
Logical; if |
screen_pct |
Percentage of features to retain during screening
( |
filter_method |
Optional feature-filter method for non-survival runs.
Supported values: |
filter_pct |
Optional retention percentage (in |
prevalence_pct |
Optional retention percentage (in |
meta_learner |
Meta-learner for late fusion (stacked generalization).
Defaults to |
run_concat |
Should early fusion be run? Default is TRUE.
Uses the specified |
run_stacked |
Should stacked model (late fusion) be run? Default is TRUE. |
drop_poor_performing_layers |
Logical; if |
verbose |
logical; TRUE for |
print_learner |
logical; Should a detailed summary be printed? Default is TRUE. |
refit.stack |
logical; For late fusion, post-refit predictions on the entire data are returned if specified. Default is FALSE. |
family |
Currently allows |
... |
Additional arguments passed to the underlying engines. |
Internally, IntegratedLearner() converts the MultiAssayExperiment
into tabular multi-view matrices and then calls IL_conbin for
continuous/binary outcomes or IL_survival for time-to-event outcomes,
depending on the specified family.
A list-like IntegratedLearner object containing the trained model fits (layer-specific, stacked, and/or concatenated models), cross-validated performance estimates, and predicted values for training and, if supplied, validation data.
Himel Mallick, [email protected]
is.function(IntegratedLearner) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = rnorm(n), row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) pcl <- list( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata ) fit <- IntegratedLearner( PCL_train = pcl, folds = 3, base_learner = "SL.mean", run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE, family = stats::gaussian() ) names(fit) }is.function(IntegratedLearner) if (FALSE) { set.seed(1) n <- 20 feature_table <- rbind( matrix(rnorm(3 * n), nrow = 3, dimnames = list(paste0("L1_F", 1:3), paste0("S", 1:n))), matrix(rnorm(2 * n), nrow = 2, dimnames = list(paste0("L2_F", 1:2), paste0("S", 1:n))) ) sample_metadata <- data.frame( subjectID = paste0("ID", 1:n), Y = rnorm(n), row.names = colnames(feature_table) ) feature_metadata <- data.frame( featureID = rownames(feature_table), featureType = c(rep("Layer1", 3), rep("Layer2", 2)), row.names = rownames(feature_table) ) pcl <- list( feature_table = feature_table, sample_metadata = sample_metadata, feature_metadata = feature_metadata ) fit <- IntegratedLearner( PCL_train = pcl, folds = 3, base_learner = "SL.mean", run_stacked = FALSE, run_concat = FALSE, print_learner = FALSE, family = stats::gaussian() ) names(fit) }
microRNA and associated covariate/outcome table for TCGA BRCA examples.
data(mir_all)data(mir_all)
A data frame with one row per patient and covariates plus microRNA features in columns.
TCGA-derived example data bundled for package tests/tutorials.
External validation cohort example data in PCL format for binary outcome modeling.
data(NLIBD)data(NLIBD)
A list with components:
data frame of features (rows) by samples (columns).
data frame of sample-level metadata with Y
and subjectID.
data frame of feature-level metadata with
featureID and featureType.
Packaged example data for tutorials and tests.
NNLS function to optimize weights of several base learners
NNLS(x, y, wt)NNLS(x, y, wt)
x |
x |
y |
y |
wt |
wt |
Solution of the quadratic programming problem
x <- cbind(c(0.1, 0.4, 0.6, 0.9), c(0.2, 0.5, 0.7, 0.8)) y <- c(0.15, 0.45, 0.65, 0.85) fit <- NNLS(x = x, y = y, wt = rep(1, nrow(x))) fit$solutionx <- cbind(c(0.1, 0.4, 0.6, 0.9), c(0.2, 0.5, 0.7, 0.8)) y <- c(0.15, 0.45, 0.65, 0.85) fit <- NNLS(x = x, y = y, wt = rep(1, nrow(x))) fit$solution
Plots outcome-appropriate performance summaries for the
training set and, if available, the validation set produced by an
IntegratedLearner fit. Depending on the outcome family, this may
include ROC curves, R-squared bar plots, multiclass one-vs-rest ROC
curves, or survival AUC / Kaplan-Meier panels.
## S3 method for class 'learner' plot( x, y = NULL, label_size = 8, label_x = 0.3, vjust = 0.1, rowwise_plot = TRUE, ... )## S3 method for class 'learner' plot( x, y = NULL, label_size = 8, label_x = 0.3, vjust = 0.1, rowwise_plot = TRUE, ... )
x |
Fitted |
y |
Unused (required for S3 signature). |
label_size |
Optional numeric label size for subplot tags. Default is 8. |
label_x |
Optional single value or vector of x positions for subplot labels, relative to each subplot. Defaults to 0.3 for all labels. |
vjust |
Adjusts the vertical position of each label. More positive values move the label further down on the plot canvas. Can be a single value (applied to all labels) or a vector of values (one for each label). Default is 0.1. |
rowwise_plot |
If both train and test data are available, should the
train and test plots be arranged row-wise? Default is |
... |
Additional arguments (currently unused). |
A list whose $plot entry is a ggplot2/cowplot
composite object, along with the underlying tabular data used to generate
the plot.
This function makes predictions using a trained 'IntegratedLearner' model for new samples for which predictions are to be made
## S3 method for class 'learner' predict( object, feature_table_valid = NULL, sample_metadata_valid = NULL, feature_metadata = NULL, outcome_col = NULL, subject_id_col = NULL, ... )## S3 method for class 'learner' predict( object, feature_table_valid = NULL, sample_metadata_valid = NULL, feature_metadata = NULL, outcome_col = NULL, subject_id_col = NULL, ... )
object |
Fitted 'IntegratedLearner' object |
feature_table_valid |
Feature table from validation set. Must have the exact same structure as feature_table. |
sample_metadata_valid |
OPTIONAL (can provide feature_table_valid and not this): Sample-specific metadata table from independent validation set. If provided, it must have the exact same structure as sample_metadata. |
feature_metadata |
Matrix containing feature names and their corresponding layers. Must be same as that provided in IntegratedLearner object. |
outcome_col |
Optional outcome column name in |
subject_id_col |
Optional subject identifier column name in
|
... |
Additional arguments (currently unused) |
Predicted values
Predict function for SL.BART
## S3 method for class 'SL.BART' predict(object, newdata, family, X = NULL, Y = NULL, ...)## S3 method for class 'SL.BART' predict(object, newdata, family, X = NULL, Y = NULL, ...)
object |
Fitted SL.BART model object |
newdata |
Data frame for prediction |
family |
Family object passed through (unused) |
X |
Training design matrix (unused) |
Y |
Training outcome (unused) |
... |
Additional arguments (unused) |
Prediction from the SL.BART
Predict function for SL.nnls.auc
## S3 method for class 'SL.nnls.auc' predict(object, newdata, ...)## S3 method for class 'SL.nnls.auc' predict(object, newdata, ...)
object |
Fitted SL.nnls.auc model |
newdata |
Validation layer-level predictions |
... |
Additional arguments (unused) |
Prediction from the meta-learner
Multi-omics pregnancy example data in PCL format for continuous outcome modeling.
data(pregnancy)data(pregnancy)
A list with components:
data frame of features (rows) by samples (columns).
data frame of sample-level metadata with Y
and subjectID.
data frame of feature-level metadata with
featureID and featureType.
Packaged example data for tutorials and tests.
PRISM cohort example data in PCL format for binary outcome modeling.
data(PRISM)data(PRISM)
A list with components:
data frame of features (rows) by samples (columns).
data frame of sample-level metadata with Y
and subjectID.
data frame of feature-level metadata with
featureID and featureType.
Packaged example data for tutorials and tests.
Support bayesian additive regression trees via the bartMachine package.
SL.BART( Y, X, newX, family, obsWeights, id, num_trees = 50, num_burn_in = 250, verbose = FALSE, alpha = 0.95, beta = 2, k = 2, q = 0.9, nu = 3, num_iterations_after_burn_in = 1000, serialize = TRUE, seed = 5678, ... )SL.BART( Y, X, newX, family, obsWeights, id, num_trees = 50, num_burn_in = 250, verbose = FALSE, alpha = 0.95, beta = 2, k = 2, q = 0.9, nu = 3, num_iterations_after_burn_in = 1000, serialize = TRUE, seed = 5678, ... )
Y |
Outcome variable |
X |
Covariate dataframe |
newX |
Optional dataframe to predict the outcome |
family |
'gaussian' for regression, 'binomial' for binary classification |
obsWeights |
Optional observation-level weights (supported but not tested) |
id |
Optional id to group observations from the same unit (not used currently). |
num_trees |
The number of trees to be grown in the sum-of-trees model. |
num_burn_in |
Number of MCMC samples to be discarded as 'burn-in'. |
verbose |
Prints information about progress of the algorithm to the screen. |
alpha |
Base hyperparameter in tree prior for whether a node is nonterminal or not. |
beta |
Power hyperparameter in tree prior for whether a node is nonterminal or not. |
k |
For regression, k determines the prior probability that E(Y|X) is
contained in the interval |
q |
Quantile of the prior on the error variance at which the data-based estimate is placed. Note that the larger the value of q, the more aggressive the fit as you are placing more prior weight on values lower than the data-based estimate. Not used for classification. |
nu |
Degrees of freedom for the inverse chi^2 prior. Not used for classification. |
num_iterations_after_burn_in |
Number of MCMC samples to draw from the posterior distribution of f(x). |
serialize |
If TRUE, bartMachine results can be saved to a file, but will require additional RAM. |
seed |
Seed for reproducibility passed to bartMachine. |
... |
Additional arguments (not used) |
A list with elements pred (predictions for newX) and
fit (the fitted model object).
is.function(SL.BART) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.BART( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), num_trees = 5, num_burn_in = 5, num_iterations_after_burn_in = 20 ) fit$pred }is.function(SL.BART) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.BART( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), num_trees = 5, num_burn_in = 5, num_iterations_after_burn_in = 20 ) fit$pred }
Penalized regression using elastic net. Alpha = 0 corresponds to ridge regression and alpha = 1 corresponds to Lasso.
See vignette('glmnet_beta', package = 'glmnet') for a nice tutorial on
glmnet.
SL.enet( Y, X, newX, family, obsWeights, id, alpha = seq(0, 1, 0.1), nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )SL.enet( Y, X, newX, family, obsWeights, id, alpha = seq(0, 1, 0.1), nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )
Y |
Outcome variable |
X |
Covariate dataframe |
newX |
Dataframe to predict the outcome |
family |
'gaussian' for regression, 'binomial' for binary classification. Untested options: 'multinomial' for multiple classification or 'mgaussian' for multiple response, 'poisson' for non-negative outcome with proportional mean and variance, 'cox'. |
obsWeights |
Optional observation-level weights |
id |
Optional id to group observations from the same unit (not used currently). |
alpha |
Elastic net mixing parameter, range 0 to 1. 0 = ridge regression and 1 = lasso. |
nfolds |
Number of folds for internal cross-validation to optimize lambda. |
nlambda |
Number of lambda values to check, recommended to be 100 or more. |
useMin |
If TRUE use lambda that minimizes risk, otherwise use 1 standard-error rule which chooses a higher penalty with performance within one standard error of the minimum (see Breiman et al. 1984 on CART for background). |
loss |
Loss function, can be 'deviance', 'mse', or 'mae'. If family = binomial can also be 'auc' or 'class' (misclassification error). |
... |
Any additional arguments are passed through to cv.glmnet. |
A list with elements pred (predictions for newX) and
fit (cross-validated glmnet fit metadata).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
predict.SL.glmnet cv.glmnet
glmnet
set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.enet( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), alpha = c(0, 0.5, 1), nfolds = 3, nlambda = 10 ) head(fit$pred)set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.enet( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), alpha = c(0, 0.5, 1), nfolds = 3, nlambda = 10 ) head(fit$pred)
Penalized regression using elastic net. Alpha = 0 corresponds to ridge regression and alpha = 1 corresponds to Lasso.
See vignette('glmnet_beta', package = 'glmnet') for a nice tutorial on
glmnet.
SL.glmnet2( Y, X, newX, family, obsWeights, id, alpha = 0.5, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )SL.glmnet2( Y, X, newX, family, obsWeights, id, alpha = 0.5, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )
Y |
Outcome variable |
X |
Covariate dataframe |
newX |
Dataframe to predict the outcome |
family |
'gaussian' for regression, 'binomial' for binary classification. Untested options: 'multinomial' for multiple classification or 'mgaussian' for multiple response, 'poisson' for non-negative outcome with proportional mean and variance, 'cox'. |
obsWeights |
Optional observation-level weights |
id |
Optional id to group observations from the same unit (not used currently). |
alpha |
Elastic net mixing parameter, range 0 to 1. 0 = ridge regression and 1 = lasso. |
nfolds |
Number of folds for internal cross-validation to optimize lambda. |
nlambda |
Number of lambda values to check, recommended to be 100 or more. |
useMin |
If TRUE use lambda that minimizes risk, otherwise use 1 standard-error rule which chooses a higher penalty with performance within one standard error of the minimum (see Breiman et al. 1984 on CART for background). |
loss |
Loss function, can be 'deviance', 'mse', or 'mae'. If family = binomial can also be 'auc' or 'class' (misclassification error). |
... |
Any additional arguments are passed through to cv.glmnet. |
A list with elements pred (predictions for newX) and
fit (cross-validated glmnet fit metadata).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
predict.SL.glmnet cv.glmnet
glmnet
set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.glmnet2( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), nfolds = 3, nlambda = 10 ) head(fit$pred)set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.glmnet2( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), nfolds = 3, nlambda = 10 ) head(fit$pred)
Horseshoe regression
SL.horseshoe( Y, X, newX, family, prior = "horseshoe", N = 20000L, burnin = 1000L, thinning = 1L, ... )SL.horseshoe( Y, X, newX, family, prior = "horseshoe", N = 20000L, burnin = 1000L, thinning = 1L, ... )
Y |
Outcome variable |
X |
Covariate data frame |
newX |
Dataframe to predict the outcome |
family |
'gaussian' for regression, 'binomial' for binary classification. Untested options: 'poisson' for for integer or count data |
prior |
prior for regression coefficients to use. 'Horseshoe' by default. Untested options: ridge regression (prior='rr' or prior='ridge'), lasso regression (prior='lasso') and horseshoe+ regression (prior='hs+' or prior='horseshoe+') |
N |
Number of posterior samples to generate. |
burnin |
Number of burn-in samples. |
thinning |
Desired level of thinning. |
... |
other parameters passed to bayesreg function |
SL object
is.function(SL.horseshoe) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.horseshoe( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian() ) fit$pred }is.function(SL.horseshoe) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.horseshoe( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian() ) fit$pred }
Penalized regression using elastic net. Alpha = 0 corresponds to ridge regression and alpha = 1 corresponds to Lasso.
See vignette('glmnet_beta', package = 'glmnet') for a nice tutorial on
glmnet.
SL.LASSO( Y, X, newX, family, obsWeights, id, alpha = 1, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )SL.LASSO( Y, X, newX, family, obsWeights, id, alpha = 1, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )
Y |
Outcome variable |
X |
Covariate dataframe |
newX |
Dataframe to predict the outcome |
family |
'gaussian' for regression, 'binomial' for binary classification. Untested options: 'multinomial' for multiple classification or 'mgaussian' for multiple response, 'poisson' for non-negative outcome with proportional mean and variance, 'cox'. |
obsWeights |
Optional observation-level weights |
id |
Optional id to group observations from the same unit (not used currently). |
alpha |
Elastic net mixing parameter, range 0 to 1. 0 = ridge regression and 1 = lasso. |
nfolds |
Number of folds for internal cross-validation to optimize lambda. |
nlambda |
Number of lambda values to check, recommended to be 100 or more. |
useMin |
If TRUE use lambda that minimizes risk, otherwise use 1 standard-error rule which chooses a higher penalty with performance within one standard error of the minimum (see Breiman et al. 1984 on CART for background). |
loss |
Loss function, can be 'deviance', 'mse', or 'mae'. If family = binomial can also be 'auc' or 'class' (misclassification error). |
... |
Any additional arguments are passed through to cv.glmnet. |
A list with elements pred (predictions for newX) and
fit (cross-validated glmnet fit metadata).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
predict.SL.glmnet cv.glmnet
glmnet
set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) linpred <- 0.5 * X$x1 - 0.25 * X$x2 Y <- stats::rbinom(nrow(X), 1, stats::plogis(linpred)) fit <- SL.LASSO( Y = Y, X = X, newX = X[1:3, ], family = stats::binomial(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), nfolds = 3, nlambda = 10 ) head(fit$pred)set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) linpred <- 0.5 * X$x1 - 0.25 * X$x2 Y <- stats::rbinom(nrow(X), 1, stats::plogis(linpred)) fit <- SL.LASSO( Y = Y, X = X, newX = X[1:3, ], family = stats::binomial(), obsWeights = rep(1, nrow(X)), id = seq_len(nrow(X)), nfolds = 3, nlambda = 10 ) head(fit$pred)
SuperLearner wrapper for mixed-effects BART using the mxBART package. This learner is optional and requires mxBART to be installed.
SL.mxBART( Y, X, newX, family, obsWeights, id, sparse = FALSE, ntree = 50, ndpost = 1000, nskip = 100, keepevery = 10, mxps = list(list(prior = 1, df = 3, scale = 1)), ... )SL.mxBART( Y, X, newX, family, obsWeights, id, sparse = FALSE, ntree = 50, ndpost = 1000, nskip = 100, keepevery = 10, mxps = list(list(prior = 1, df = 3, scale = 1)), ... )
Y |
Outcome variable. |
X |
Covariate data frame (training). |
newX |
Covariate data frame (prediction). |
family |
A |
obsWeights |
Optional observation weights (currently unused). |
id |
Optional grouping id for mixed-effects BART. |
sparse |
Logical; passed to |
ntree |
Number of trees. |
ndpost |
Number of posterior draws. |
nskip |
Number of burn-in draws. |
keepevery |
Thinning interval. |
mxps |
Prior specification list passed to |
... |
Additional arguments passed to |
A list with elements pred and fit (SuperLearner convention).
is.function(SL.mxBART) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.mxBART( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = rep(1:5, each = 4), ntree = 10, ndpost = 20, nskip = 10, keepevery = 2 ) fit$pred }is.function(SL.mxBART) if (FALSE) { set.seed(1) X <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) Y <- rnorm(20) fit <- SL.mxBART( Y = Y, X = X, newX = X[1:3, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)), id = rep(1:5, each = 4), ntree = 10, ndpost = 20, nskip = 10, keepevery = 2 ) fit$pred }
Combined SuperLearner function for both NNLS/AUC maximization
SL.nnls.auc(Y, X, newX, family, obsWeights, bounds = c(0, Inf), ...)SL.nnls.auc(Y, X, newX, family, obsWeights, bounds = c(0, Inf), ...)
Y |
Outcome matrix from metalearner |
X |
Layer-level predictions used to train the metalearner |
newX |
Layer-level predictions for validation data |
family |
Family object |
obsWeights |
Observation weights |
bounds |
Lower/upper bounds for weights (binomial case) |
... |
Additional arguments passed through |
Estimated meta-learner coefficients and predictions
set.seed(1) X <- data.frame(m1 = runif(20), m2 = runif(20)) Y <- rnorm(20) fit <- SL.nnls.auc( Y = Y, X = X, newX = X[1:4, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)) ) head(fit$pred)set.seed(1) X <- data.frame(m1 = runif(20), m2 = runif(20)) Y <- rnorm(20) fit <- SL.nnls.auc( Y = Y, X = X, newX = X[1:4, ], family = stats::gaussian(), obsWeights = rep(1, nrow(X)) ) head(fit$pred)
Allow update of IntegratedLearner if only a subset of omics layers are available in test set. If all layers and features match, it calls predict.learner()
## S3 method for class 'learner' update( object, feature_table_valid, sample_metadata_valid = NULL, feature_metadata_valid, outcome_col = NULL, subject_id_col = NULL, seed = 1234, verbose = FALSE, ... )## S3 method for class 'learner' update( object, feature_table_valid, sample_metadata_valid = NULL, feature_metadata_valid, outcome_col = NULL, subject_id_col = NULL, seed = 1234, verbose = FALSE, ... )
object |
fitted 'IntegratedLearner' object |
feature_table_valid |
Feature table from validation set. It should be a data frame with features in rows and samples in columns. Feature names should be a subset of training data feature names. |
sample_metadata_valid |
OPTIONAL (can provide feature_table_valid and not this): Sample-specific metadata table from independent validation set. If provided, it must have the exact same structure as sample_metadata. Default is NULL. |
feature_metadata_valid |
Matrix containing feature names and their corresponding layers. Must be subset of feature_metadata provided in IntegratedLearner object. |
outcome_col |
Optional outcome column name in |
subject_id_col |
Optional subject ID column name in
|
seed |
Seed for reproducibility. Default is 1234. |
verbose |
Should a summary of fits/ results be printed. Default is FALSE |
... |
Additional arguments (unused) |
SL object
is.function(getS3method("update", "learner")) if (FALSE) { # Build a fit with IntegratedLearner() first, then update with reduced layers. update( object = fit, feature_table_valid = feature_table_valid, sample_metadata_valid = sample_metadata_valid, feature_metadata_valid = feature_metadata_valid ) }is.function(getS3method("update", "learner")) if (FALSE) { # Build a fit with IntegratedLearner() first, then update with reduced layers. update( object = fit, feature_table_valid = feature_table_valid, sample_metadata_valid = sample_metadata_valid, feature_metadata_valid = feature_metadata_valid ) }