| Title: | Multi-omics matrix factorization with transfer learning |
|---|---|
| Description: | A transfer learning algorithm for multi-omics matrix factorization called 'MOTL' (Multi-Omics Transfer Learning). 'MOTL' is a Bayesian transfer learning method, based on 'MOFA'. 'MOTL' infers latent factor values for a multi-omics target dataset, consisting of a small number of samples, by incorporating latent factor values already inferred with a 'MOFA' factorization of a large, heterogeneous, learning dataset. |
| Authors: | David Hirst [aut] (ORCID: <https://orcid.org/0000-0001-7574-5531>), Morgane Térézol [cre] (ORCID: <https://orcid.org/0000-0002-4090-2573>) |
| Maintainer: | Morgane Térézol <[email protected]> |
| License: | GPL-3 + file LICENCE |
| Version: | 0.99.1 |
| Built: | 2026-06-21 11:12:10 UTC |
| Source: | https://github.com/BiocStaging/MOTL |
Normalize counts data using DESeq2 normalization. Two ways of normalization:
Use the pre-calculated Geometric means of the learning dataset
Use calculated Geometric means of the expdat dataset given
in input
countsNormalization(expdat, GeoMeans)countsNormalization(expdat, GeoMeans)
expdat |
SE object of experimental data (could be miRNA or mRNA) |
GeoMeans |
if it's a character, Geometric means will be calculated
for the |
If is.numeric(GeoMeans) == TRUE, input data are normalized with
pre-calculated Geometric means (from learning dataset).
If non values are provided, Geometric means is calculated based on the
input dataset using GeoMeanFun function.
Then, the input dataset is normalized using these Geometric means.
list of data.frame of the counts normalized and Geometric means calculated
## Create a matrix with "counts" data ## Then, create a summarized experiment object expdat <- matrix(rexp(200, rate = .1), ncol = 20) expdat <- apply(expdat, MARGIN = 2, round) expdat <- SummarizedExperiment::SummarizedExperiment(expdat) ## With "newGeoMeans", geometric means will be calculated based on the ## input matrix GeoMeans <- "newGeoMeans" expdat_counts_norm <- countsNormalization(expdat, GeoMeans)## Create a matrix with "counts" data ## Then, create a summarized experiment object expdat <- matrix(rexp(200, rate = .1), ncol = 20) expdat <- apply(expdat, MARGIN = 2, round) expdat <- SummarizedExperiment::SummarizedExperiment(expdat) ## With "newGeoMeans", geometric means will be calculated based on the ## input matrix GeoMeans <- "newGeoMeans" expdat_counts_norm <- countsNormalization(expdat, GeoMeans)
Log2 transform and select top data based on variance
countsTransformation(expdat_count, TopD)countsTransformation(expdat_count, TopD)
expdat_count |
data.frame of the counts |
TopD |
number of features to keep |
data.frame of the log2 transformed and filtered data
## expdat_count <- matrix(rexp(200, rate = .1), ncol = 20) expdat_count <- apply(expdat_count, MARGIN = 2, round) ## input matrix TopD <- 20 expdat_counts_fltr <- countsTransformation(expdat_count, TopD)## expdat_count <- matrix(rexp(200, rate = .1), ncol = 20) expdat_count <- apply(expdat_count, MARGIN = 2, round) ## input matrix TopD <- 20 expdat_counts_fltr <- countsTransformation(expdat_count, TopD)
E_Z_SqE_W_Sq
E_Z_SqE_W_Sq is the multiplication of the squared expected values of Z
matrix with the squared expected values of W matrix
.
E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)
view |
current view name |
ZMu_0 |
vector of coefficients for weight intercepts |
ZMu |
matrix of Z values |
Fctrzn_Lrn_W0 |
list of factorized learning set weight intercept matrices |
Fctrzn_Lrn_W |
list of factorized learning set weight matrices |
E_Z_SqE_W_Sq for current view
data("TL_param", package = "MOTL") view <- "mRNA" ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W E_Z_SqE_W_Sq <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)data("TL_param", package = "MOTL") view <- "mRNA" ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W E_Z_SqE_W_Sq <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)
E_ZE_W
E_ZE_W is the multiplication of the expected values of Z matrix with the
expected values of W matrix .
E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)
view |
current view name |
ZMu_0 |
vector of coefficients for weight intercepts |
ZMu |
matrix of Z values |
Fctrzn_Lrn_W0 |
list of factorized learning set weight intercept matrices |
Fctrzn_Lrn_W |
list of factorized learning set weight matrices |
E_ZE_W for current view
data("TL_param", package = "MOTL") view <- "mRNA" ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W E_ZE_W <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)data("TL_param", package = "MOTL") view <- "mRNA" ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W E_ZE_W <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W)
E_ZSqE_WSq
E_ZSqE_WSq is the multiplication of the expected values of the
squared Z matrix with the expected values of the squared W matrix
E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq)E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq)
view |
current view name |
ZMu_0 |
vector of coefficients for weight intercepts |
ZMuSq |
matrix of squared Z values |
Fctrzn_Lrn_W0 |
list of factorized learning set weight intercept matrices |
Fctrzn_Lrn_WSq |
list of factorized learning set weight squared matrices |
E_ZSqE_WSq for current view
data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZSqE_WSq <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq)data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZSqE_WSq <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq)
E_ZWSq
E_ZWSq is the expected values of the multiplication of the Z matrix with
weight squared W matrix .
E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq)E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq)
view |
current view name |
E_ZE_W |
matrix of |
ZMuSq |
matrix of squared ZMu values for current view |
E_Z_SqE_W_Sq |
matrix of |
E_ZSqE_WSq |
matrix of |
E_ZWSq values for current view
data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq)data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq)
for poisson and bernoulli it is the bound which is used for gaussian it is expanded gaussian log likelihood
ELBO_calculation( view, likelihoods, Tau, TauLn, E_ZWSq, E_ZE_W, Zeta, YTrg, YGauss, PoisRateCstnt )ELBO_calculation( view, likelihoods, Tau, TauLn, E_ZWSq, E_ZE_W, Zeta, YTrg, YGauss, PoisRateCstnt )
view |
a character of current view name data |
likelihoods |
a named list of data types. The list can contain
|
Tau |
list of Tau matrices |
TauLn |
list of log(Tau) matrices |
E_ZWSq |
expected values of the multiplication of the Z matrix
with weight squared W matrix. See |
E_ZE_W |
multiplication of the expected values of Z matrix
with the expected values of W matrix. Seed
|
Zeta |
list of Zeta matrices |
YTrg |
list of data |
YGauss |
list of pseudo Y value matrices |
PoisRateCstnt |
small constant added for Poisson data to avoid errors |
the ELBO value for the current view/iteration
data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Tau <- TL_param$Tau TauLn <- TL_param$TauLn Zeta <- TL_param$Zeta YTrg <- TL_param$YTrg ZMu <- TL_param$ZMu CenterTrg <- FALSE PoisRateCstnt <- 0.0001 YGauss <- TL_param$YTrg YGauss$mRNA <- YGauss_calculation(view = view, likelihoods = likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt) ZMuSq <- TL_param$ZMuSq ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq) ELBO_L <- ELBO_calculation(view, likelihoods, Tau, TauLn, E_ZWSq, E_ZE_W, Zeta, YTrg, YGauss, PoisRateCstnt)data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Tau <- TL_param$Tau TauLn <- TL_param$TauLn Zeta <- TL_param$Zeta YTrg <- TL_param$YTrg ZMu <- TL_param$ZMu CenterTrg <- FALSE PoisRateCstnt <- 0.0001 YGauss <- TL_param$YTrg YGauss$mRNA <- YGauss_calculation(view = view, likelihoods = likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt) ZMuSq <- TL_param$ZMuSq ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq) ELBO_L <- ELBO_calculation(view, likelihoods, Tau, TauLn, E_ZWSq, E_ZE_W, Zeta, YTrg, YGauss, PoisRateCstnt)
Calculate the Geometric mean of a vector
GeoMeanFun(x)GeoMeanFun(x)
x |
vector of numeric values |
Geometric mean of vector x
x <- c(125,12,4545,7878,6777,454545,88979) GeoMeans <- GeoMeanFun(x)x <- c(125,12,4545,7878,6777,454545,88979) GeoMeans <- GeoMeanFun(x)
Retrieve the Geometric means calculated for the learning dataset during counts normalization
GeoMeans_Lrn_init(view, expdat_meta_Lrn, YTrgFtrs)GeoMeans_Lrn_init(view, expdat_meta_Lrn, YTrgFtrs)
view |
current view data name |
expdat_meta_Lrn |
list of learning dataset factorization metadata |
YTrgFtrs |
feature names of the current view |
precalculated Geometric means of the learning dataset
data("Lrn", package = "MOTL") data("Trg", package = "MOTL") expdat_meta_Lrn <- Lrn$Fctrzn@data YTrgFtrs <- Trg$Trg_meta$ftrs_mRNA GeoMeans_Lrn <- GeoMeans_Lrn_init(view = "mRNA", expdat_meta_Lrn, YTrgFtrs)data("Lrn", package = "MOTL") data("Trg", package = "MOTL") expdat_meta_Lrn <- Lrn$Fctrzn@data YTrgFtrs <- Trg$Trg_meta$ftrs_mRNA GeoMeans_Lrn <- GeoMeans_Lrn_init(view = "mRNA", expdat_meta_Lrn, YTrgFtrs)
The function performs the following steps:
Extract the factorized learning set weight intercepts W0
Extract the factorized learning set weights W
Extract the factorized learning set squared weights Wsq
Extract the learning set Tau and log(Tau) TauLn
For each extracted parameter, common features between learning dataset
and target dataset are kept. Then target data YTrg, Tau
and TauLn are transposed.
initTransferLearningParamaters(YTrg, views, Fctrzn, likelihoods)initTransferLearningParamaters(YTrg, views, Fctrzn, likelihoods)
YTrg |
a named list of target dataset matrices. Names correspond to the defined views. |
views |
a vector of target dataset view names (e.g.
|
Fctrzn |
the learning dataset factorization model object
(from |
likelihoods |
a named list of data types. The list can contain
|
Each parameter are extracted from the Fctrzn model created using
MOFA2.
YTrg matrices should have the same columns order.
a list of initialized parameters for transfer learning
YTrg - the transposed named list of target data
Fctrzn_Lrn_W0 - the Factorized learning set weight intercepts
with same features as YTrg
Fctrzn_Lrn_W - Factorized learning set weights with same
features as YTrg
Fctrzn_Lrn_WSq - Factorized learning set squared weights with
same features as YTrg
Tau - the transposed Tau matrix with same features as YTrg
TauLn - the transposed log2(Tau) matrix with same features as
YTrg
data("Lrn", package = "MOTL") data("Trg", package = "MOTL") views <- c("mRNA", "miRNA", "DNAme", "SNV") likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Fctrzn <- Lrn$Fctrzn_init YTrg <- Trg$YTrg_prep TLparameter <- initTransferLearningParamaters(YTrg = YTrg, views = views, Fctrzn = Fctrzn, likelihoods = likelihoods)data("Lrn", package = "MOTL") data("Trg", package = "MOTL") views <- c("mRNA", "miRNA", "DNAme", "SNV") likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Fctrzn <- Lrn$Fctrzn_init YTrg <- Trg$YTrg_prep TLparameter <- initTransferLearningParamaters(YTrg = YTrg, views = views, Fctrzn = Fctrzn, likelihoods = likelihoods)
Calculate feature weight intercepts for a MOFA factorization based on MLE. These can be calculated for the learning dataset factorization and then for the factorization of the target dataset with transfer learning.
intercepts_calculation(expdat_meta, Fctrzn, FctrznDir, ExpDataDir, Seed)intercepts_calculation(expdat_meta, Fctrzn, FctrznDir, ExpDataDir, Seed)
expdat_meta |
the named list of learning dataset factorization metadata |
Fctrzn |
learning dataset factorization model object (from
|
FctrznDir |
the learning dataset factorization directory name |
ExpDataDir |
the learning dataset directory name |
Seed |
random seed number |
For Gaussian observed data, weight intercepts are the weight mean for
each feature.
For Poisson and Bernoulli observed data, weight intercepts are calculated
using the maximum likelihood and the mle function.
a file, named EstimatedIntercepts.rds and saved into
FctrznDir directory.
# data("Lrn", package = "MOTL") expdat_meta <- Lrn$Lrn_meta Fctrzn <- Lrn$Fctrzn FctrznDir <- "FctrznDir" ExpDataDir <- "ExpDataDir" Seed <- 1234567 intercepts_calculation(expdat_meta, Fctrzn, FctrznDir, ExpDataDir, Seed)# data("Lrn", package = "MOTL") expdat_meta <- Lrn$Lrn_meta Fctrzn <- Lrn$Fctrzn FctrznDir <- "FctrznDir" ExpDataDir <- "ExpDataDir" Seed <- 1234567 intercepts_calculation(expdat_meta, Fctrzn, FctrznDir, ExpDataDir, Seed)
Contains factorization model results calculated using MOFA2 and
initialized for transfer learning: Fctrzn and Fctrzn_init.
data("Lrn")data("Lrn")
List of two MOFA objects: Fctrzn and
Fctrzn_init.
MOFA2 object is a S4 class and contains the following main slots (the ones relevant and used for the transfer learning):
input data used for the factorization analysis (mRNA, miRNA, DNAme and SNV)
sample metadata (i.e. sample names and group)
features metadata (feature identifiers and views)
expected values of the factors and the loadings
model training statistics
data processing options
model options
model training options
dimensions of the model (e.g. M number of views,
N number of samples, D number of features)
For more information about the structure of MOFA object, see the
MOFA2 vignette.
output of the factorization analysis of the learning dataset
output of the factorization analysis of the learning dataset and the initialized values for the transfer learning
Get mRNA ensembl ID version from learning dataset (e.g. ENSG00000122133.17) and attach to the corresponding mRNA ensembl ID in the target dataset. Feature names need to be similar between target dataset and learning dataset.
mRNA_addVersion(expdat, Lrndat)mRNA_addVersion(expdat, Lrndat)
expdat |
the mRNA matrix from the target dataset with genes in rows. Gene names should be in ensembl format and don't contain the version (e.g. ENSG00000122133). Rownames contain ensembl IDs and colnames sample names. |
Lrndat |
the mRNA W matrix from the learning dataset factorization with genes in rows. Gene names should be in ensembl format. Rownames contain ensembl IDs. |
the target mRNA matrix with versions attached
Lrn_names <- c("ENSG00000122133.17", "ENSG00000122194.9", "ENSG00000119411.1") Lrn_views <- c("mRNA", "mRNA", "mRNA") expdat_names <- c("ENSG00000122133", "ENSG00000122194", "ENSG00000119411") Lrndat <- data.frame("view" = Lrn_views, row.names = Lrn_names) expdat <- data.frame("sample1" = c(1, 52, 4), row.names = expdat_names) expdat_prep <- mRNA_addVersion(expdat, Lrndat) expdat expdat_prepLrn_names <- c("ENSG00000122133.17", "ENSG00000122194.9", "ENSG00000119411.1") Lrn_views <- c("mRNA", "mRNA", "mRNA") expdat_names <- c("ENSG00000122133", "ENSG00000122194", "ENSG00000119411") Lrndat <- data.frame("view" = Lrn_views, row.names = Lrn_names) expdat <- data.frame("sample1" = c(1, 52, 4), row.names = expdat_names) expdat_prep <- mRNA_addVersion(expdat, Lrndat) expdat expdat_prep
Counts data (i.e. mRNA and miRNA) can be normalized and/or transformed.
preprocessCountsData( view, YTrg_list, normalization = FALSE, expdat_meta_Lrn, transformation = FALSE )preprocessCountsData( view, YTrg_list, normalization = FALSE, expdat_meta_Lrn, transformation = FALSE )
view |
a data view name vector (i.e. mRNA or miRNA) |
YTrg_list |
a named list of target data. Names correspond to the defined views. The list contains matrices. |
normalization |
if FALSE, no normalization. If "LrnGeoMeans", normalization using the pre-calculated Geometric means. If "newGeoMeans", normalization using Geometric means from dataset. By default, it's set to FALSE. |
expdat_meta_Lrn |
the list of learning set factorization metadata |
transformation |
if FALSE, no transformation. If TRUE, log2 normalization. |
Normalization is performed using the countsNormalization
function with pre-calculated or new calculated Geometric means.
Transformation is performed using the countsTransformation
function with log2.
Preprocessed counts data for the current view
data("Lrn", package = "MOTL") data("Trg", package = "MOTL") expdat_meta_Lrn <- Lrn$Fctrzn@data YTrg_list <- Trg$YTrg_list mRNA <- preprocessCountsData(view = "mRNA", YTrg_list = YTrg_list, normalization = FALSE, expdat_meta_Lrn = expdat_meta_Lrn, transformation = FALSE)data("Lrn", package = "MOTL") data("Trg", package = "MOTL") expdat_meta_Lrn <- Lrn$Fctrzn@data YTrg_list <- Trg$YTrg_list mRNA <- preprocessCountsData(view = "mRNA", YTrg_list = YTrg_list, normalization = FALSE, expdat_meta_Lrn = expdat_meta_Lrn, transformation = FALSE)
The function performs the following steps:
Remove the features with variance equal to zero
Harmonize features between the target data and the learning data. Only the shared features are kept.
Order columns according the order of samples (i.e. smpls)
TargetDataPrefiltering(view, YTrg_list, Fctrzn, smpls)TargetDataPrefiltering(view, YTrg_list, Fctrzn, smpls)
view |
current view data name (e.g. "mRNA", or "DNAme") |
YTrg_list |
a named list of target data. Names correspond to the
views defined and the corresponding data are saved into |
Fctrzn |
the learning dataset factorization model object (from
|
smpls |
an ordered vector of sample names |
a matrix that contains the prepared data for the current view with the sample ordered.
data("Lrn", package = "MOTL") data("Trg", package = "MOTL") view <- "mRNA" YTrg_list <- Trg$YTrg_prep Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrg_list$mRNA) mRNA_prep <- TargetDataPrefiltering(view, YTrg_list, Fctrzn, smpls)data("Lrn", package = "MOTL") data("Trg", package = "MOTL") view <- "mRNA" YTrg_list <- Trg$YTrg_prep Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrg_list$mRNA) mRNA_prep <- TargetDataPrefiltering(view, YTrg_list, Fctrzn, smpls)
The function follows these steps:
Prepare target data for each view
Normalize and/or transform counts data
TargetDataPreparation( views, YTrg_list, Fctrzn, smpls, expdat_meta_Lrn, normalization = FALSE, transformation = FALSE )TargetDataPreparation( views, YTrg_list, Fctrzn, smpls, expdat_meta_Lrn, normalization = FALSE, transformation = FALSE )
views |
a list of target data views (e.g. |
YTrg_list |
a named list of target set data. Names correspond to the defined views. The list contains matrices. |
Fctrzn |
the learning factorization model object (from |
smpls |
a vector of sample names (i.e. column names of the
|
expdat_meta_Lrn |
the list of learning set factorization metadata |
normalization |
if FALSE, no normalization. If "LrnGeoMeans", normalization using the learning Geometric means. If "newGeoMeans", Geometric means are calculated using the input data. By default it's set to FALSE. |
transformation |
if FALSE, no transformation. If TRUE, log2 transformation of counts data. By default it's set to FALSE. |
Preparation of data consists on
removing features with variance equal to zero
features harmonization between target and learning data
column ordering between views.
Preparation is perform using the
TargetDataPrefiltering function. See the documentation
for more details.
It could be possible to normalize and or transform counts data using the
preprocessCountsData function. Normalization can be done
using Geometric means (from learning or target dataset) and transformation
is a log2 transformation of the counts.
a list of prepared target data
data("Lrn", package = "MOTL") data("Trg", package = "MOTL") YTrg_list <- Trg$YTrg_prep Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrg_list$mRNA) expdat_meta_Lrn <- Lrn$Lrn_meta YTrg_prep <- TargetDataPreparation(views = c("mRNA", "miRNA", "DNAme"), YTrg_list = YTrg_list, Fctrzn = Fctrzn, smpls = smpls, expdat_meta_Lrn = expdat_meta_Lrn, normalization = FALSE, transformation = FALSE)data("Lrn", package = "MOTL") data("Trg", package = "MOTL") YTrg_list <- Trg$YTrg_prep Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrg_list$mRNA) expdat_meta_Lrn <- Lrn$Lrn_meta YTrg_prep <- TargetDataPreparation(views = c("mRNA", "miRNA", "DNAme"), YTrg_list = YTrg_list, Fctrzn = Fctrzn, smpls = smpls, expdat_meta_Lrn = expdat_meta_Lrn, normalization = FALSE, transformation = FALSE)
Tau values are updated only for bernoulli data.
Tau_calculation(view, likelihoods, Zeta, Tau)Tau_calculation(view, likelihoods, Zeta, Tau)
view |
a character of current view name data |
likelihoods |
a named list of data types. The list can contain
|
Zeta |
list of Zeta matrix for the current view |
Tau |
list of tau matrices |
Tau values are updated using the following equation:
(updated) Tau matrix for the current view data
data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Zeta <- TL_param$Zeta Tau <- TL_param$Tau Tau <- Tau_calculation(view = view, likelihoods = likelihoods, Zeta = Zeta, Tau = Tau)data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") Zeta <- TL_param$Zeta Tau <- TL_param$Tau Tau <- Tau_calculation(view = view, likelihoods = likelihoods, Zeta = Zeta, Tau = Tau)
Extract the Tau matrix from the MOFA object Fctrzn for each view.
More explanation about Tau.
Tau_init(viewsLrn, Fctrzn, InputModel)Tau_init(viewsLrn, Fctrzn, InputModel)
viewsLrn |
the list of learning data views. For TCGA learning data
it will be |
Fctrzn |
the learning dataset factorization from |
InputModel |
the factorization model object of learning set
|
a named list of Tau matrices. Names correspond to the view names.
library("MOFA2") viewsLrn <- c("mRNA", "miRNA", "DNAme", "SNV") InputModel <- "Model.hdf5" Fctrzn <- load_model(file = InputModel) Tau_list <- Tau_init(viewsLrn = viewsLrn, Fctrzn = Fctrzn, InputModel = InputModel)library("MOFA2") viewsLrn <- c("mRNA", "miRNA", "DNAme", "SNV") InputModel <- "Model.hdf5" Fctrzn <- load_model(file = InputModel) Tau_list <- Tau_init(viewsLrn = viewsLrn, Fctrzn = Fctrzn, InputModel = InputModel)
Two ways to initialize the log(Tau) values:
log transformation of the expected Tau (already init in the
Fctrzn) variable
extract values from a .csv file that saved in LrnFctrnDir
directory
Tau is initialized only for gaussian data.
TauLn_calculation(view, likelihoodsLrn, Fctrzn, LrnFctrnDir, LrnSimple = TRUE)TauLn_calculation(view, likelihoodsLrn, Fctrzn, LrnFctrnDir, LrnSimple = TRUE)
view |
a character of current view name data (e.g. "mRNA") |
likelihoodsLrn |
a named list of data types. The list can contain
|
Fctrzn |
learning dataset factorization model object (from
|
LrnFctrnDir |
directory where log(Tau) values are saved. Files
should be named like |
LrnSimple |
if TRUE, initialization uses the Tau values. If FALSE, imports values from a .csv file. By default is set to "TRUE". |
For gaussian data,
the log(Tau) matrix for the current view
library("MOFA2") data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn_init likelihoodsLrn <- get_default_model_options(Fctrzn)$likelihoods TauLn_mRNA = TauLn_calculation(view = "mRNA", likelihoodsLrn = likelihoodsLrn, Fctrzn = Fctrzn, LrnSimple = TRUE, LrnFctrnDir = LrnFctrnDir)library("MOFA2") data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn_init likelihoodsLrn <- get_default_model_options(Fctrzn)$likelihoods TauLn_mRNA = TauLn_calculation(view = "mRNA", likelihoodsLrn = likelihoodsLrn, Fctrzn = Fctrzn, LrnSimple = TRUE, LrnFctrnDir = LrnFctrnDir)
This function performs a prefiltering analysis through the steps:
Extract data for the given sample names (brcds_SS)
Remove features with variance equal to zero
Match features between target data set and learning data set
TCGATargetDataPrefiltering(view, brcds_SS, SS, YTrgFull, Fctrzn)TCGATargetDataPrefiltering(view, brcds_SS, SS, YTrgFull, Fctrzn)
view |
a character of current view name data |
brcds_SS |
a list of sample names for each view. The list is named according views (e.g. "brcds_mRNA_SS") and contains list of dataframes for each view. Each dataframe contains at least one column named "brcds" (the one used). |
SS |
current subset number |
YTrgFull |
a named list of target set data. Names correspond to the
defined views. The list contains |
Fctrzn |
learning factorization model object (from |
The function return a pre-filtered target dataframe for the current view.
the subset data for current view and SS number
# In the paper, several target datasets were created as subset of the # reference dataset R. This function was used to generate them # automatically. # If you are not doing the paper analysis, you can create the brcds_SS # using the following command line. Replace the "nameX" with the sample # names of your data. # You can as much as you want add dataframe on each view. brcds_SS_ex <- list("brcds_mRNA_SS" = list(data.frame("brcds" = c("name01", "name02"))), "brcds_miRNA_SS" = list(data.frame("brcds" = c("name10", "name11")))) # See the doc to create the input parameter data("Trg", package = "MOTL") data("Lrn", package = "MOTL") brcds_SS <- Trg$brcds_SS YTrg_list <- Trg$YTrg_list Lrn_Fctrzn <- Lrn$Fctrzn expdat_mRNA <- TCGATargetDataPrefiltering(view = "mRNA", brcds_SS = brcds_SS, SS = 1, YTrgFull = YTrg_list, Fctrzn = Lrn_Fctrzn) expdat_mRNA[c(1:5), c(1:5)]# In the paper, several target datasets were created as subset of the # reference dataset R. This function was used to generate them # automatically. # If you are not doing the paper analysis, you can create the brcds_SS # using the following command line. Replace the "nameX" with the sample # names of your data. # You can as much as you want add dataframe on each view. brcds_SS_ex <- list("brcds_mRNA_SS" = list(data.frame("brcds" = c("name01", "name02"))), "brcds_miRNA_SS" = list(data.frame("brcds" = c("name10", "name11")))) # See the doc to create the input parameter data("Trg", package = "MOTL") data("Lrn", package = "MOTL") brcds_SS <- Trg$brcds_SS YTrg_list <- Trg$YTrg_list Lrn_Fctrzn <- Lrn$Fctrzn expdat_mRNA <- TCGATargetDataPrefiltering(view = "mRNA", brcds_SS = brcds_SS, SS = 1, YTrgFull = YTrg_list, Fctrzn = Lrn_Fctrzn) expdat_mRNA[c(1:5), c(1:5)]
This function follows these steps:
Filter out features according variance
Reshape data into matrices
Order samples to have the same columns order between different views
Normalize and/or transform counts data
TCGATargetDataPreparation( views, YTrgFull, brcds_SS, SS, Fctrzn, smpls, normalization = FALSE, expdat_meta_Lrn, transformation = TRUE )TCGATargetDataPreparation( views, YTrgFull, brcds_SS, SS, Fctrzn, smpls, normalization = FALSE, expdat_meta_Lrn, transformation = TRUE )
views |
a list of target data views (e.g. |
YTrgFull |
a named list of target set data. Names correspond to the
defined views. The list contains |
brcds_SS |
a list of sample names for each view. The list is named according views (e.g. "brcds_mRNA_SS") and contains list of dataframes for each view. Each dataframe contains at least one column named "brcds" (the one used). |
SS |
current subset number |
Fctrzn |
the learning factorization model object (from |
smpls |
a vector of sample names (i.e. column names of the
|
normalization |
if FALSE, no normalization. If "LrnGeoMeans", normalization using the learning Geometric means. If "newGeoMeans", normalization with target Geometric means. By default it's set to "FALSE". |
expdat_meta_Lrn |
the list of learning set factorization metadata |
transformation |
if FALSE, no transformation. If TRUE, log2 transformation of counts data. By default it's set to "FALSE" |
First, samples included in the brcds_SS list are selected. Then features
with variance equal to zero are removed. They are also remove if they are
not retrieved in the learning dataset. These steps are performed using
TCGATargetDataPrefiltering function.
The mRNA, miRNA et DNAme data are stored into SummarizedExperiment
object. For the next step, data have to be stored into a matrix.
SNV data are already a matrix.
Then, samples are ordered in the same way between views.
Finally, counts data (e.g. mRNA and miRNA) can be normalized and/or
transformed using preprocessCountsData function.
if normalization = FALSE: counts data are not normalized
if normalization = "LrnGeoMeans": counts data are normalized
using the learning dataset Geometric means calculated
if normalization = "newGeoMeans": counts data are normalized
using the geometric means calculated on the target dataset.
Normalization is performed in the countsNormalization
function using estimateSizeFactors from
DESeq2 package. And transformation is perform using
countsTransformation with a log2 transformation.
Look GeoMeans_Lrn_init and GeoMeanFun to see
how learning Geometric means are calculated.
list of prepared subset data for the current subset number
# see to create input data data("Trg", package = "MOTL") data("Lrn", package = "MOTL") views <- c("mRNA", "miRNA", "DNAme", "SNV") YTrgFull <- Trg$YTrg_list brcds_SS <- Trg$brcds_SS SS <- 1 Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrgFull$mRNA) expdat_meta_Lrn <- Lrn$Lrn_meta YTrg_prep <- TCGATargetDataPreparation(views, YTrgFull, brcds_SS, SS, Fctrzn, smpls, normalization = FALSE, expdat_meta_Lrn, transformation = FALSE) YTrg_prep$mRNA[c(1:5), c(1:5)] YTrg_prep$DNAme[c(1:5), c(1:5)] # In the paper, several target datasets were created as subset of the # reference dataset R. This function was used to generate them # automatically. # If you are not doing the paper analysis, you can create the brcds_SS # using the following command line. Replace the "nameX" with the sample # names of your data. # You can as much as you want add dataframe on each view. brcds_SS_ex <- list("brcds_mRNA_SS" = list(data.frame("brcds" = c("name01", "name02"))), "brcds_miRNA_SS" = list(data.frame("brcds" = c("name10", "name11")))) # The SS parameter corresponds to the index of the subset you want to # prepare for a specific view. It's generated automatically if you used # the workflow describe in the github paper # \link{https://github.com/david-hirst/MOTL/blob/main/TCGAStudy/00_TCGAstudy_ReadMe.md}# see to create input data data("Trg", package = "MOTL") data("Lrn", package = "MOTL") views <- c("mRNA", "miRNA", "DNAme", "SNV") YTrgFull <- Trg$YTrg_list brcds_SS <- Trg$brcds_SS SS <- 1 Fctrzn <- Lrn$Fctrzn smpls <- colnames(YTrgFull$mRNA) expdat_meta_Lrn <- Lrn$Lrn_meta YTrg_prep <- TCGATargetDataPreparation(views, YTrgFull, brcds_SS, SS, Fctrzn, smpls, normalization = FALSE, expdat_meta_Lrn, transformation = FALSE) YTrg_prep$mRNA[c(1:5), c(1:5)] YTrg_prep$DNAme[c(1:5), c(1:5)] # In the paper, several target datasets were created as subset of the # reference dataset R. This function was used to generate them # automatically. # If you are not doing the paper analysis, you can create the brcds_SS # using the following command line. Replace the "nameX" with the sample # names of your data. # You can as much as you want add dataframe on each view. brcds_SS_ex <- list("brcds_mRNA_SS" = list(data.frame("brcds" = c("name01", "name02"))), "brcds_miRNA_SS" = list(data.frame("brcds" = c("name10", "name11")))) # The SS parameter corresponds to the index of the subset you want to # prepare for a specific view. It's generated automatically if you used # the workflow describe in the github paper # \link{https://github.com/david-hirst/MOTL/blob/main/TCGAStudy/00_TCGAstudy_ReadMe.md}
Contains a list of input variables used for the transfer learning.
data("TL_param")data("TL_param")
list of the prepared input target dataset (mRNA, miRNA, DNAme, SNV). Samples are in columns and features are in rows.
list of 4 variables (mRNA, miRNA, DNAme and SNV). Each contains a W0 vector named with the corresponding feature names.
list of 4 data.frame (mRNA, miRNA, DNAme and
SNV). Each contains factors in columns and features in rows.
list of 4 data.frame (mRNA, miRNA, DNAme and
SNV). Each contains factors in columns and features in rows.
list of 4 data.frame (mRNA, miRNA, DNAme and
SNV). Each contains features in columns.
list of 4 data.frame (mRNA, miRNA, DNAme and
SNV). Each contains features in columns.
data.frame with factors in columns.
data.frame with factors in columns and samples in rows.
vector of numerics
data.frame with factors in columns.
This function performs multi-omics matrix factorization with transfer learning. The target dataset is factorized using the latent factor values inferred from the previous factorization of a learning dataset.
transferLearning_function( TL_param, MaxIterations, MinIterations, minFactors, views, likelihoods, Fctrzn, StartDropFactor, FreqDropFactor, StartELBO, FreqELBO, DropFactorTH, ConvergenceIts, ConvergenceTH, CenterTrg, PoisRateCstnt = 1e-04, ss_start_time = NULL, outputDir = "./" )transferLearning_function( TL_param, MaxIterations, MinIterations, minFactors, views, likelihoods, Fctrzn, StartDropFactor, FreqDropFactor, StartELBO, FreqELBO, DropFactorTH, ConvergenceIts, ConvergenceTH, CenterTrg, PoisRateCstnt = 1e-04, ss_start_time = NULL, outputDir = "./" )
TL_param |
a named list of initialized parameters and data objects for transfer learning. It contains target dataset, weigths and scores matrices from matrix factorization of the learning dataset calculated using MOFA. See the detail section for more informations. |
MaxIterations |
the maximum number of iterations for the matrix factorization convergence. After this number, the factorization is stopped. |
MinIterations |
the minimum number of iterations for the matrix factorization convergence. Before this number, even if the function converges, the factorization is not stopped. |
minFactors |
the minimum number of factors to retain |
views |
a named vector of the target dataset. It should contains the same names used for inside the learning dataset. |
likelihoods |
a named vector of the target dataset types. It can
contain |
Fctrzn |
the learning factorization model object (from |
StartDropFactor |
number after which iteration to start dropping factors |
FreqDropFactor |
number that corresponds to how often to check whether to drop factors |
StartELBO |
number after which iteration to start checking ELBO on |
FreqELBO |
number that correspond to how often to assess the ELBO |
DropFactorTH |
threshold number to drop or not factors. If factor with lowest maximum variance explained is below this threshold, it's dropped. |
ConvergenceIts |
number of consecutive iterations that change in ELBO is below threshold before convergence |
ConvergenceTH |
threshold number for change in ELBO for checking convergence |
CenterTrg |
if TRUE, center the target dataset during processing, if FALSE, leave target dataset uncentered and use estimated learning dataset intercepts. |
PoisRateCstnt |
amount number added to the Poisson rate function
to avoid error. By default is equal to 0.0001. It's used in the pseudo
gaussian values calculation |
ss_start_time |
time recorded before the preprocessing step starts.
Generated using |
outputDir |
output directory name where to save results. By default results are saved in the current directory. |
This function is called after target dataset is prepared (using
TargetDataPreparation) and parameters initialized (using
initTransferLearningParamaters).
TL_param is a named list of the initialized parameters and
data objects for transfer learning. It contains :
YTrg: a named list of matrices. Each matrix corresponds to
the target dataset.
Fctrzn_Lrn_W0: a named list of vectors. Each vector contains
the features mean weight matrix calculated for the learning dataset using
MOFA.
Fctrzn_Lrn_W: a named list of matrices. Each matrix contains
the weights matrix calculated for the learning dataset using MOFA.
Fctrzn_Lrn_WSq: a named list of matrices. Each matrix contains
the squared weights matrix calculated for the learning dataset using
MOFA.
Tau: a named list of matrices. Each matrix contains the Tau
values matrix calculated for the learning dataset using MOFA.
TauLn: a named list of matrices. Each matrix contains the
TauLn values matrix calculated for the learning dataset using MOFA.
Names of each list should be identical (e.g.
c("mRNA", "miRNA", "DNAme", "SNV")) and so each element
corresponds to each omic data.
To create the TL_param variable, see the
initTransferLearningParamaters function.
a named list of results. It contains
YTrgSS list of matrices of target dataset
YGauss list of matrices of pseudo gaussian target dataset
ZMu_0 list of ZMu intercepts matrices
ZMu list of ZMu
Fctrzn_Lrn_W0 list of learning features mean weight matrix
Fctrzn_Lrn_W list of learning weights matrix
ELBO numeric value of ELBO
VarExpl variance explained by each target dataset
ss_start_time time when start the analysis (i.e. before the
preprocessing step)
ss_fit_start_time time when start the transfer learning
analysis
ss_end_time time when finish the transfer learning.
Results are also saved into TL_data.rds file.
data("TL_param", package = "MOTL") ss_start_time <- Sys.time() minFactors <- 13 StartDropFactor <- 1 FreqDropFactor <- 1 StartELBO <- 1 FreqELBO <- 5 DropFactorTH <- 0.01 MaxIterations <- 10 MinIterations <- 2 ConvergenceIts <- 2 ConvergenceTH <- 0.0005 PoisRateCstnt <- 0.0001 TL_data <- transferLearning_function(TL_param, MaxIterations, MinIterations, minFactors, views, likelihoods, Fctrzn, StartDropFactor, FreqDropFactor, StartELBO, FreqELBO, DropFactorTH, ConvergenceIts, ConvergenceTH, CenterTrg, PoisRateCstnt = 0.0001, ss_start_time = NULL, outputDir = "./")data("TL_param", package = "MOTL") ss_start_time <- Sys.time() minFactors <- 13 StartDropFactor <- 1 FreqDropFactor <- 1 StartELBO <- 1 FreqELBO <- 5 DropFactorTH <- 0.01 MaxIterations <- 10 MinIterations <- 2 ConvergenceIts <- 2 ConvergenceTH <- 0.0005 PoisRateCstnt <- 0.0001 TL_data <- transferLearning_function(TL_param, MaxIterations, MinIterations, minFactors, views, likelihoods, Fctrzn, StartDropFactor, FreqDropFactor, StartELBO, FreqELBO, DropFactorTH, ConvergenceIts, ConvergenceTH, CenterTrg, PoisRateCstnt = 0.0001, ss_start_time = NULL, outputDir = "./")
Contains a list of target datasets used in different step in the transfer learning workflow and there associated metadata.
data("Trg")data("Trg")
list of the target dataset - Samples are in columns and features are in rows.
mRNA:data stored into a SummarizedExperiment object
miRNA:data stored into a SummarizedExperiment object
DNAme:data stored into a SummarizedExperiment object
SNV:data stored into a matrix
list of metadata:
character
smpls, ftrs_mRNA, ftrs_miRNA, ftrs_DNAme, ftrs_SNV
data.frame
with three variables (brcds, submittor, prjct) - brcds_mRNA, brcds_miRNA, brcds_DNAme, brcds_SNV
integer
Seed, ElbowK_Total, ElbowK_mRNA, ElbowK_miRNA, ElbowK_DNAme, ElbowK_SNV
logical
if_vst
numeric
PCVarPrcnt_mRNA, PCVarPrcnt_miRNA, PCVarPrcnt_DNAme, PCVarPrcnt_SNV
a list of 4 list of data.frame with the sample names.
list of the prepared input target dataset (mRNA, miRNA, DNAme, SNV). Samples are in columns and features are in rows.
Calculate the variance explained by each factor for each view
VarExplFun(views, YGauss, ZMu_0, Fctrzn_Lrn_W0, ZMu, Fctrzn_Lrn_W)VarExplFun(views, YGauss, ZMu_0, Fctrzn_Lrn_W0, ZMu, Fctrzn_Lrn_W)
views |
list of view names |
YGauss |
list of pseudo Y value matrices |
ZMu_0 |
vector of coefficients for weight intercepts |
Fctrzn_Lrn_W0 |
list of factorized learning set weight intercept matrices |
ZMu |
matrix of Z values |
Fctrzn_Lrn_W |
list of factorized learning set weight matrices |
variance explained matrix
VarExpl <- VarExplFun(views, YGauss, ZMu_0, Fctrzn_Lrn_W0, ZMu, Fctrzn_Lrn_W)VarExpl <- VarExplFun(views, YGauss, ZMu_0, Fctrzn_Lrn_W0, ZMu, Fctrzn_Lrn_W)
This function loads or calculates the weight intercept values.
W0_calculation(view, CenterTrg, Fctrzn, LrnFctrnDir)W0_calculation(view, CenterTrg, Fctrzn, LrnFctrnDir)
view |
a character of current view name data |
CenterTrg |
if FALSE, use the estimated feature weight intercept
from the |
Fctrzn |
learning dataset factorization model object (from
|
LrnFctrnDir |
directory where the extimated intercepts file is. |
The weight intercept values can be load from the
EstimatedIntercepts.rds file. This file can be created using the
intercepts_calculation function.
The weight intercept values can also be initialized using the weight matrix. The weight matrix is set to zero.
a feature weight intercept values matrix for the current data
data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn W0_mRNA = W0_calculation(view = "mRNA", CenterTrg = TRUE, Fctrzn = Fctrzn, LrnFctrnDir = LrnFctrnDir)data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn W0_mRNA = W0_calculation(view = "mRNA", CenterTrg = TRUE, Fctrzn = Fctrzn, LrnFctrnDir = LrnFctrnDir)
This function load or calculate the squares weight values.
WSq_calculation(view, Fctrzn, LrnFctrnDir, LrnSimple = TRUE)WSq_calculation(view, Fctrzn, LrnFctrnDir, LrnSimple = TRUE)
view |
a character of current view name data |
Fctrzn |
learning dataset factorization model object (from
|
LrnFctrnDir |
directory where |
LrnSimple |
if TRUE, calculates the squared weight values |
The squared weight values can be load from a .csv file. This file can be
created during the factorization of the learning data. See the
documentation to learn how to create this file. The file name should
follow this format: WSq_mRNA.csv.
The squared weight valued can also be calculated using the weight values
calculated during the factorization of the learning data. These values
are saved in the Fctrzn variable. See the documentation.
the squared weight matrix for the current view
library("MOFA2") data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn likelihoodsLrn <- get_default_model_options(Fctrzn)$likelihoods WSq_mRNA = WSq_calculation(view = "mRNA", Fctrzn = Fctrzn, LrnFctrnDir = LrnFctrnDir, LrnSimple = TRUE)library("MOFA2") data("Lrn", package = "MOTL") Fctrzn <- Lrn$Fctrzn likelihoodsLrn <- get_default_model_options(Fctrzn)$likelihoods WSq_mRNA = WSq_calculation(view = "mRNA", Fctrzn = Fctrzn, LrnFctrnDir = LrnFctrnDir, LrnSimple = TRUE)
For gaussian data, Y values (observed data) are centered
(if CenterTrg = TRUE) and will not change.
For non gaussian, Y values are transformed and change after each
update of Z matrix. The Y pseudo values are centered at each step if
CenterTrg = TRUE.
For gaussian data this is done for each iteration It>=0, for others
it is done for each iteration, exept the first one It>0.
YGauss_calculation( view, likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt )YGauss_calculation( view, likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt )
view |
a character of current view name data |
likelihoods |
a named list of data types. The list can contain
|
YTrg |
current data matrix |
Zeta |
list of Zeta matrices |
Tau |
list of Tau matrices |
CenterTrg |
if FALSE, use the estimated feature weight intercept
from the |
PoisRateCstnt |
small constant added when transforming Poisson data to avoid errors |
pseudo Y values for the current view
data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") CenterTrg <- FALSE YTrg <- TL_param$YTrg Zeta <- TL_param$Zeta PoisRateCstnt <- 0.0001 YGauss <- YGauss_calculation(view = view, likelihoods = likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt)data("TL_param", package = "MOTL") view <- "mRNA" likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") CenterTrg <- FALSE YTrg <- TL_param$YTrg Zeta <- TL_param$Zeta PoisRateCstnt <- 0.0001 YGauss <- YGauss_calculation(view = view, likelihoods = likelihoods, YTrg, Zeta, Tau, CenterTrg, PoisRateCstnt)
For the current data view, calculate the Zeta matrix Zeta.
Zeta_calculation(view, likelihoods, E_ZWSq, E_ZE_W)Zeta_calculation(view, likelihoods, E_ZWSq, E_ZE_W)
view |
a character of current view name data (e.g. |
likelihoods |
a named list of data types. The list can contain
|
E_ZWSq |
expected values of the multiplication of the Z matrix with weight squared W matrix. |
E_ZE_W |
multiplication of the expected values of Z matrix with the expected values of W matrix |
E_ZWSq. E_ZWSq
is calculated using the E_ZWSq_update
function.E_ZE_W.
E_ZE_W is calculated using the E_ZE_W_update
function., so
Zeta matrix for the current data view
data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq) Zeta <- Zeta_calculation(view = "mRNA", likelihoods = likelihoods, E_ZWSq = E_ZWSq, E_ZE_W = E_ZE_W)data("TL_param", package = "MOTL") view <- "mRNA" ZMuSq <- TL_param$ZMuSq ZMu <- TL_param$ZMu ZMu_0 <- TL_param$ZMu_0 Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq likelihoods <- c("mRNA" = "gaussian", "miRNA" = "gaussian", "DNAme" = "gaussian", "SNV" = "bernoulli") E_ZE_W <- list() E_Z_SqE_W_Sq <- list() E_ZSqE_WSq <- list() E_ZWSq <- list() E_ZE_W$mRNA <- E_ZE_W_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_Z_SqE_W_Sq$mRNA <- E_Z_SqE_W_Sq_update(view, ZMu_0, ZMu, Fctrzn_Lrn_W0, Fctrzn_Lrn_W) E_ZSqE_WSq$mRNA <- E_ZSqE_WSq_update(view, ZMu_0, ZMuSq, Fctrzn_Lrn_W0, Fctrzn_Lrn_WSq) E_ZWSq$mRNA <- E_ZWSq_update(view, E_ZE_W, ZMuSq, E_Z_SqE_W_Sq, E_ZSqE_WSq) Zeta <- Zeta_calculation(view = "mRNA", likelihoods = likelihoods, E_ZWSq = E_ZWSq, E_ZE_W = E_ZE_W)
ZMu calculation for the current dataZ matrix ZMu calculation for the current data
ZMu_calculation(view, k, Fctrzn_Lrn_W, Fctrzn_Lrn_W0, Tau, ZMu_0, ZMu, YGauss)ZMu_calculation(view, k, Fctrzn_Lrn_W, Fctrzn_Lrn_W0, Tau, ZMu_0, ZMu, YGauss)
view |
a character of current view name data |
k |
feature index in the current data |
Fctrzn_Lrn_W |
list of factorized learning set weight matrices |
Fctrzn_Lrn_W0 |
list of factorized learning set weight intercept matrices |
Tau |
list of Tau matrices |
ZMu_0 |
vector of coefficients for weight intercepts |
ZMu |
matrix of Z values |
YGauss |
list of pseudo Y value matrices |
ZMu values for the current view (Z matrix)
data("TL_param", package = "MOTL") k <- 10 view <- "mRNA" Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Tau <- TL_param$Tau ZMu_0 <- TL_param$ZMu_0 ZMu <- TL_param$ZMu YGauss <- TL_param$YTrg ZMu <- TL_param$ZMu ZMu <- ZMu_calculation(view, k, Fctrzn_Lrn_W, Fctrzn_Lrn_W0, Tau, ZMu_0, ZMu, YGauss)data("TL_param", package = "MOTL") k <- 10 view <- "mRNA" Fctrzn_Lrn_W <- TL_param$Fctrzn_Lrn_W Fctrzn_Lrn_W0 <- TL_param$Fctrzn_Lrn_W0 Tau <- TL_param$Tau ZMu_0 <- TL_param$ZMu_0 ZMu <- TL_param$ZMu YGauss <- TL_param$YTrg ZMu <- TL_param$ZMu ZMu <- ZMu_calculation(view, k, Fctrzn_Lrn_W, Fctrzn_Lrn_W0, Tau, ZMu_0, ZMu, YGauss)
Z variances is calculation using initialized or updated Tau values
and the squared weight values WSq values
based on the appendix of the MOFA2 paper
and Github code
ZVar_calculation(view, Tau, Fctrzn_Lrn_WSq)ZVar_calculation(view, Tau, Fctrzn_Lrn_WSq)
view |
a character of current view name data |
Tau |
list of Tau matrices |
Fctrzn_Lrn_WSq |
Factorized learning set squared weights |
calculated Z variances matrix for the current data
data("TL_param", package = "MOTL") Tau <- TL_param$Tau Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq ZVar <- ZVar_calculation(view = "mRNA", Tau, Fctrzn_Lrn_WSq)data("TL_param", package = "MOTL") Tau <- TL_param$Tau Fctrzn_Lrn_WSq <- TL_param$Fctrzn_Lrn_WSq ZVar <- ZVar_calculation(view = "mRNA", Tau, Fctrzn_Lrn_WSq)