Main steps
Main steps to perform transfer learning analysis using MOTL:
- Initialize the learning dataset
Lrn - Prepare the target dataset
Trg - Define parameters for transfer learning
TL_param - Run
transferLearning_function()
This is an example of how to use MOTL with the basics commands. Data used in this example are subsets of the original data.
The complete learning dataset can be found via this link: zenodo link.
⚠️ WARNING
Don’t use the data inside the MOTL package to perform analysis.
Datasets used in this example are stored in two objects:
Lrn: learning dataset (used for transfer learning)Trg: target dataset (data to analyse)For more details, see ?MOTL::Lrn and
?MOTL::Trg.
Main steps to perform transfer learning analysis using MOTL:
LrnTrgTL_paramtransferLearning_function()LrnLoad the learning dataset and the corresponding factorization model
from the MOTL package.
For the learning dataset, you need two data:
MOFA2The learning dataset metadata are stored in
Lrn$Lrn_meta.
The expdat_meta_Lrn object contains information about
the learning dataset construction.
names(expdat_meta_Lrn)
#> [1] "if_vst" "smpls" "ftrs_mRNA"
#> [4] "ftrs_miRNA" "ftrs_DNAme" "ftrs_SNV"
#> [7] "PCVarPrcnt_mRNA" "PCVarPrcnt_miRNA" "PCVarPrcnt_DNAme"
#> [10] "PCVarPrcnt_SNV" "ElbowK_Total" "ElbowK_mRNA"
#> [13] "ElbowK_miRNA" "ElbowK_DNAme" "ElbowK_SNV"
#> [16] "GeoMeans_mRNA" "GeoMeans_miRNA" "Seed"
#> [19] "script_start_time" "script_end_time"You can retrieve for example mRNA feature names.
expdat_meta_Lrn$ftrs_mRNA[c(1:5)]
#> [1] "ENSG00000232216.1" "ENSG00000170561.13" "ENSG00000155011.9"
#> [4] "ENSG00000128714.6" "ENSG00000009950.16"You can also retrieve the SNV feature names.
Or, you can retrieve the sample names.
📝 NOTE
To load the model file, you can use
load_model()function fromMOFA2And to load .rds file, you can use the
readRDS()function.
expdat_meta_Lrn <- readRDS(file.path(LrnDir, "expdat_meta.rds"))InputModel <- file.path(LrnFctrnDir, "Model.hdf5")Fctrzn <- load_model(file = InputModel)
The learning dataset factorization model is stored in
Lrn$Fctrzn.
The Fctrzn was created using MOFA2 package
and is composed of:
mRNA,
miRNA, DNAme and SNV,mRNA, DNAme and SNV datasets
with 1000 features each and miRNA with 250 features,Fctrzn
#> Trained MOFA with the following characteristics:
#> Number of views: 4
#> Views names: mRNA miRNA DNAme SNV
#> Number of features (per view): 1000 250 1000 1000
#> Number of groups: 1
#> Groups names: group0
#> Number of samples (per group): 250
#> Number of factors: 20See ?MOFA2::MOFA for more details about the
MOFA object.
You need to retrieve some information from the learning dataset factorization model:
viewsLrn: views of the learning datasetlikelihoodsLrn: defined likelihoods of each viewMLrn: dimension (number of views) of the learning
datasetviewsLrn <- get_default_data_options(Fctrzn)$views
likelihoodsLrn <- get_default_model_options(Fctrzn)$likelihoods
MLrn <- get_dimensions(Fctrzn)$MviewsLrn
#> [1] "mRNA" "miRNA" "DNAme" "SNV"
likelihoodsLrn
#> mRNA miRNA DNAme SNV
#> "gaussian" "gaussian" "gaussian" "bernoulli"
MLrn
#> [1] 4Then, you need to specify the CenterTrg parameter. If it
set to TRUE, it allows the user to center the target
dataset during processing. If it set to FALSE, it leaves it
uncentered and use the estimated learning dataset intercepts (for
normalization).
Here, we will use the estimated learning dataset intercepts.
Then, the factorization expectations values need to be initialized.
Fctrzn@expectations[["Tau"]] <- Tau_init(viewsLrn, Fctrzn, InputModel)
Fctrzn@expectations[["TauLn"]] <- sapply(viewsLrn, TauLn_calculation, likelihoodsLrn, Fctrzn, LrnFctrnDir)
Fctrzn@expectations[["WSq"]] <- sapply(viewsLrn, WSq_calculation, Fctrzn, LrnFctrnDir)
Fctrzn@expectations[["W0"]] <- sapply(viewsLrn, W0_calculation, CenterTrg, Fctrzn, LrnFctrnDir)Initialized data are stored in the Lrn$Fctrzn_init
object. The following line replaces the previous 4 lines.
Trg📝 NOTE
Target dataset is a list of matrices. You can create it like this:
YTrg_list <- list(mRNA = expdat_mRNA, miRNA = expdat_miRNA,DNAme = expdat_DNAme, SNV = expdat_SNV)
List of matrices Target dataset is a list of named matrices. Each matrix corresponds to a view (i.e. one omic data).
Features in rows Features should be in rows. They will be different between views. But, feature names should be consistent with the learning dataset. The features order is not important.
Samples in columns Samples should be in columns. Columns need to be the same between views. They will be automatically ordered.
For instance, in this analysis the learning dataset was creating using the TCGA cancer data. So:
ENSG00000000005). You
should add the gene versions that are in the learning dataset. To do
that, you have the mRNA_addVersion() function.hsa-mir-1-1).cg09364122).AKAP13).In this example, you have just to load the target dataset and the
corresponding metadata from the MOTL package.
Target dataset are stored in the Trg$YTrg_prep
object.
Extract sample names from the target dataset. Then, extract view names shared between the target and the learning datasets and the corresponding likelihoods.
smpls <- colnames(YTrg_list[[1]])
viewsTrg <- names(YTrg_list)
views <- viewsLrn[is.element(viewsLrn, viewsTrg)]
likelihoods <- likelihoodsLrn[views]To prepare the target dataset, use the
TCGATargetDataPreparation():
Lrn (have same features and same
order)Here, we will no transform (transformation = FALSE)
neither normalize dataset (normalization = FALSE), data are
already prepared.
YTrg_prep <- TargetDataPreparation(views = views, YTrg_list = YTrg_list,
Fctrzn = Fctrzn,
smpls = smpls,
normalization = FALSE,
expdat_meta_Lrn = expdat_meta_Lrn,
transformation = FALSE)Prepare inputs of the transfer learning:
YTrg: list of the target dataset matrices (prepared
with TargetDataPreparation())views: vector of target dataset view namesFctrzn: the learning dataset factorization modellikelihoods: list of view likelihoodsMOTLSet the parameter of MOTL:
minFactors: floor when dropping factors - number of
samples in evaluationsStartDropFactor: after which iteration to start
dropping factorsFreqDropFactor: how often to drop factorsStartELBO: which iteration to start checking ELBO on,
exclude initiation iterationFreqELBO: how often to assess the ELBODropFactorTH: factor with lowest max variance, that is
less than this, is droppedMaxIterations: maximum iteration numberMinIterations: minimum iteration number - at least 2
and exclude initial setup (2 is default in MOFA)ConvergenceIts: number of consecutive iterations that
change in ELBO is (2 is default in MOFA)ConvergenceTH: threshold number for change in ELBO for
checking convergence (0.0005 is default in MOFA, correspond
to the fast option)PoisRateCstnt: amount to add to the poison rate
function to avoid errors(1e-04 default)ss_start_time <- Sys.time()
minFactors <- 13
StartDropFactor <- 1
FreqDropFactor <- 1
StartELBO <- 1
FreqELBO <- 5
DropFactorTH <- 0.01
MaxIterations <- 1000
MinIterations <- 2
ConvergenceIts <- 2
ConvergenceTH <- 0.0005
PoisRateCstnt <- 0.0001 TL_data <- transferLearning_function(TL_param = TL_param,
views = views,
likelihoods = likelihoods,
Fctrzn = Fctrzn,
CenterTrg = CenterTrg,
MaxIterations = MaxIterations,
MinIterations = MinIterations,
minFactors = minFactors,
StartDropFactor = StartDropFactor,
FreqDropFactor = FreqDropFactor,
StartELBO = StartELBO,
FreqELBO = FreqELBO,
DropFactorTH = DropFactorTH,
ConvergenceIts = ConvergenceIts,
ConvergenceTH = ConvergenceTH,
ss_start_time = ss_start_time)#> [1] TRUE
Results are saved into .rds in the outputDir.
Then have access to the results:
ZMu corresponds to the inferred Z matrix that contains
samples in rows and factors in columnsFctrzn_Lrn_W$mRNA corresponds to the weight matrix of
mRNA, features are in rows and factors in columns.dim(W_mRNA)
#> [1] 68 19
W_mRNA[c(1:5), c(1:3)]
#> Factor1 Factor2 Factor3
#> ENSG00000179477.11 0.21934002 0.14642247 0.04294529
#> ENSG00000129451.12 1.11481450 0.11383030 0.27891928
#> ENSG00000119938.9 -0.28775595 -0.02471034 0.03258801
#> ENSG00000126752.8 -0.03388396 -0.41945300 0.02424950
#> ENSG00000130700.7 0.02040603 -0.01100176 0.50016305The results shown in this example may differ from yours due to the
use of random number generation. So, two runs of MOTL will produce
different results. To obtain a reproducible analysis, you can configure
random number generation using set.seed(NumberYouChose) and
run it before MOTL.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MOFA2_1.23.0 MOTL_0.99.1 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 dplyr_1.2.1
#> [3] farver_2.1.2 filelock_1.0.3
#> [5] S7_0.2.2 fastmap_1.2.0
#> [7] digest_0.6.39 lifecycle_1.0.5
#> [9] magrittr_2.0.5 compiler_4.6.0
#> [11] rlang_1.2.0 sass_0.4.10
#> [13] tools_4.6.0 yaml_2.3.12
#> [15] corrplot_0.95 knitr_1.51
#> [17] S4Arrays_1.13.0 reticulate_1.46.0
#> [19] DelayedArray_0.39.3 plyr_1.8.9
#> [21] RColorBrewer_1.1-3 abind_1.4-8
#> [23] BiocParallel_1.47.0 HDF5Array_1.41.0
#> [25] Rtsne_0.17 purrr_1.2.2
#> [27] BiocGenerics_0.59.7 sys_3.4.3
#> [29] grid_4.6.0 stats4_4.6.0
#> [31] Rhdf5lib_2.1.0 ggplot2_4.0.3
#> [33] scales_1.4.0 SummarizedExperiment_1.43.0
#> [35] cli_3.6.6 rmarkdown_2.31
#> [37] generics_0.1.4 otel_0.2.0
#> [39] reshape2_1.4.5 cachem_1.1.0
#> [41] rhdf5_2.57.1 stringr_1.6.0
#> [43] parallel_4.6.0 BiocManager_1.30.27
#> [45] XVector_0.53.0 matrixStats_1.5.0
#> [47] basilisk_1.25.0 vctrs_0.7.3
#> [49] Matrix_1.7-5 jsonlite_2.0.0
#> [51] dir.expiry_1.21.0 IRanges_2.47.2
#> [53] S4Vectors_0.51.3 ggrepel_0.9.8
#> [55] maketools_1.3.2 h5mread_1.5.0
#> [57] locfit_1.5-9.12 jquerylib_0.1.4
#> [59] tidyr_1.3.2 glue_1.8.1
#> [61] codetools_0.2-20 uwot_0.2.4
#> [63] cowplot_1.2.0 stringi_1.8.7
#> [65] gtable_0.3.6 GenomicRanges_1.65.0
#> [67] tibble_3.3.1 pillar_1.11.1
#> [69] htmltools_0.5.9 Seqinfo_1.3.0
#> [71] rhdf5filters_1.25.0 R6_2.6.1
#> [73] evaluate_1.0.5 lattice_0.22-9
#> [75] Biobase_2.73.1 png_0.1-9
#> [77] pheatmap_1.0.13 bslib_0.11.0
#> [79] Rcpp_1.1.1-1.1 SparseArray_1.13.2
#> [81] DESeq2_1.53.0 xfun_0.59
#> [83] MatrixGenerics_1.25.0 forcats_1.0.1
#> [85] buildtools_1.0.0 pkgconfig_2.0.3