SimiCviz provides various utilities for gene
regulatory network (GRN) analysis for single-cell RNA-seq data.
Originally developed to visualize the phenotype-specific GRN output from
SimiCPipeline
for R users, it also includes tools for:
The package is interoperable and agnostic to the GRN inference methods, accepting different GRN outputs.
SimiCviz easily handles SimiCPipeline outputs but also
accepts GRN weights in a flexible long format. The input
data.frame must include the following columns:
tf: Transcription factor nametarget: Target gene nameweight: Edge weight/coefficientlabel (optional): Phenotype/condition
identifier for label-specific networksOptional quality metrics for filtering weights:
pvalue / adj_p_val:
Statistical significance (for methods like SCENIC, Pando)r_squared /
adj_r_squared: Goodness of fit (for methods like
SimiC)SimiCPipeline
is a regularized regression method for inferring condition-specific
GRNs. It generates:
.pickle files with TF-target weights per
phenotype/label and adjusted R² values for goodness of fit of the target
expression model..csv file with a TF activity score matrix (cells ×
TFs).pickle files with per-label matrices (cells × TFs)
[Legacy SimiC v1 - Deprecated]These outputs are organized in a directory structure like this:
Project/
├── inputFiles/
│ ├── TF_list.csv
│ ├── expression_matrix.pickle
│ └── phenotype_annotation.txt
└── outputSimic/
├── figures/
└── matrices/
└── example1/
├── example1_L1_0.1_L2_0.01_simic_matrices.pickle
├── example1_L1_0.1_L2_0.01_simic_matrices_filtered_BIC.pickle
├── example1_L1_0.1_L2_0.01_wAUC_matrices_filtered_BIC.pickle
└── example1_L1_0.1_L2_0.01_wAUC_matrices_filtered_BIC_collected.csv
Here we show how to easily load a SimiCPipeline run. You
only need the directory, experiment name, and hyperparameters since the
directory output is standardized. SimiCviz will
automatically find the files and generate the
SimiCvizExperiment.
library(SimiCviz)
# Load entire SimiCPipeline run automatically
simic_full <- load_SimiCPipeline(
project_dir = "path/to/simic_run",
run_name = "example1",
lambda1 = "0.01",
lambda2 = "0.001"
)
# Set display names and colors for visualization (Part 3)
simic_full <- setLabelNames(
simic_full,
label_names = c("control", "PD-L1", "DAC", "Combination"),
colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0")
)simic_full
## An object of class SimiCvizExperiment
## 4 label(s), 10 TF(s), 150 target(s)
## Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
## AUC: collected (3000 cells x 10 TFs)
## Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
## Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combo
## Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
## TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
## Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
## Meta keys: adjusted_r_squared, run_name, lambda1, lambda2If you run the full SimiCPipeline
tutorial you can skip Part 2 and go straight to Part 3 for
visualizations.
If you need more flexibility in the workflow, or have precomputed
outputs using SimiC v1, you can construct
SimiCvizExperiment or AUCProcessor objects by
loading the required files separately.
# Load weights from pickle
weights_file <- system.file("extdata",
file.path("outputSimic/example1_simic_weights.pickle"),
package = "SimiCviz")
simic_weights <- read_weights_pickle(weights_file)
## Downloading uv...Done!
simic_weights[[1]][, 1:6]
## Slit2 Kif20b Ect2 Lhfpl2
## Pms1 0.0000000 0.000000 0.000000 0
## Prrx2 1.2348210 0.000000 0.000000 0
## Ets1 0.0000000 0.000000 0.000000 0
## Tet2 -1.3650950 0.000000 0.000000 0
## Crebrf 0.0000000 0.000000 0.000000 0
## Gli3 0.0000000 0.000000 0.000000 0
## Zhx2 0.0000000 0.000000 0.000000 0
## A630089N07Rik 0.0000000 1.360139 1.231132 0
## Zfhx4 -0.9495542 0.000000 0.000000 0
## Pbx3 0.0000000 2.344418 2.438383 0
## Fn1 Kif15
## Pms1 -0.8740381 0.000000
## Prrx2 0.0000000 0.000000
## Ets1 0.0000000 0.000000
## Tet2 0.0000000 0.000000
## Crebrf 0.0000000 0.000000
## Gli3 -0.9873656 0.000000
## Zhx2 0.0000000 0.000000
## A630089N07Rik -0.8189908 1.875401
## Zfhx4 0.0000000 0.000000
## Pbx3 -2.3883315 2.776151Most single-cell GRN inference methods produce table outputs with TF-target weights/estimates and p-values.
# Load GRN weights (method agnostic)
weight_path <- system.file("extdata", "example_weights.csv",
package = "SimiCviz")
# Read as data.frame
weights_df <- read_weights_csv(weight_path)
head(weights_df)
## tf target weight label
## 1 Pms1 Slit2 0.000000 0
## 2 Prrx2 Slit2 1.234821 0
## 3 Ets1 Slit2 0.000000 0
## 4 Tet2 Slit2 -1.365095 0
## 5 Crebrf Slit2 0.000000 0
## 6 Gli3 Slit2 0.000000 0
# If your method uses different column names you need to rename themExpected columns in CSV:
# Minimal
tf, target, weight
# If only one GRN a column name "label" will be all values in label should be 0
# With quality metrics
tf, target, weight, pvalue, adj_p_val
# With phenotype labels
tf, target, weight, label, pvalue, adj_p_val
The cell_labels map each cell ID in your expression
matrix or activity scores matrix to a condition/phenotype. This is
required for activity score calculation and AUC visualizations.
# Load from CSV (recommended format: columns 'cell', 'label')
cell_labels_path <- system.file("extdata",
file.path("inputFiles", "treatment_annotation.csv"),
package = "SimiCviz")
cell_labels <- load_cell_labels(cell_labels_path, header = TRUE, sep = ",")
## Cell labels file contains extra columns
## Cell / category / label
head(cell_labels)
## cell category label
## 1 05_87_62__s5 control 0
## 2 02_84_27__s3 control 0
## 3 02_95_13__s5 control 0
## 4 06_61_12__s1 control 0
## 5 03_14_11__s5 control 0
## 6 05_46_81__s4 control 0The function load_cell_labels can take alternative
formats/files:
Only required for computing activity scores. Can be:
Important: Final expression matrix must be in genes × cells format (genes as rows, cells as columns).
# Load from multiple formats
expression_mat_path <- system.file("extdata",
file.path("inputFiles", "example1_expression.pickle"),
package = "SimiCviz")
# Will auto-detect format and load
expression_mat <- load_expression_matrix(expression_mat_path)
## Converting expression matrix to sparse format...
print(class(expression_mat))
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"
print(dim(expression_mat))
## [1] 3000 160The SimiCvizExperiment container organizes weights,
activity scores, and metadata.
# From .pickle SimiC files
# Extract Adjusted R squared from SimiC outputs
out <- read_pickle(weights_file)
adjusted_r_squared <- out$adjusted_r_squared
viz_obj_simic <- SimiCvizExperiment(
weights = simic_weights,
auc = NULL, # Will compute this in the next section later but can be loaded as well
cell_labels = cell_labels,
label_names = c("control","PD-L1","DAC","Combination"),
colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"),
meta=list(adjusted_r_squared=adjusted_r_squared))
## `weights` provided as a list.
## Cell labels file contains extra columns
## cell / category / label
viz_obj_simic
## An object of class SimiCvizExperiment
## 4 label(s), 10 TF(s), 150 target(s)
## Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
## AUC collected: none
## Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
## Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
## Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
## TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
## Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
## Meta keys: adjusted_r_squaredIn many biological contexts, we aim to quantify changes in GRN
activity across cell populations. This is achieved by computing an
activity score for each TF in each cell, as described in the original SimiC
article by Peng et al published in Communications
Biology (2022). Here, we provide wrapper functions to apply this
approach to outputs from SimiCPipeline or other GRN
inference methods.
Compute activity scores from TF-Target weights and expression matrix
using SimiCvizExperiment
# From SimiC input files
# adj_r2_threshold: For SimiCPipeline style outputs it will look in metadata of viz_obj_simic, for "simic@meta$adjusted_r_squared" and filter out targets with lower R²)
# n_cores: Number of workers if backend = 'multissession' or
# Number of cores if backend = 'multicore'
viz_obj_simic <- calculate_activity_scores(
viz_obj_simic,
expression = expression_mat_path,
adj_r2_threshold = 0.7, # For SimiC style outputs
sort_by="expression", # Rank targets by expression or weight
select_top_k = NULL, # Use all targets (or limit to top K)
percent_of_target = 1.0,
n_cores = 2,
backend = "multicore",
verbose = TRUE
)
## Initializing AUCProcessor...
## `weights` provided as a list, converting to list data.frame
## Filtering weights by adjusted R2 threshold...
## Converting expression matrix to sparse format...
## Cell labels file contains extra columns
## cell / category / label
## Transposing expression matrix to
## genes x cells format based on rownames matching cell IDs
## Subsetting expression matrix to genes in GRNs.
## Number of tfs: 10 and Number of targets: 148
## Computing activity scores...
## Starting AUC computation...
## Sorting by: expression
## Targets: 100% of available
## Labels: 4 | Cells: 3000
## Backend requested: multicore | backend used: multicore |
## workers: 2
## | | | 0% | |====== | 12% | |============ | 25% | |=================== | 38% | |========================= | 50% | |=============================== | 62% | |====================================== | 75% | |============================================ | 88% | |==================================================| 100%
## AUC computation completed in 5.74 seconds!
## Result: 3000 cells x 10 TFs
## Activity scores computed and added to simic object
# Access computed scores
auc_scores <- viz_obj_simic@auc$collected
head(auc_scores[, 1:5])
## Pms1 Prrx2 Ets1 Tet2
## 05_87_62__s5 0.3017963 0.6271518 0.7016742 0.4475671
## 02_84_27__s3 0.4113587 0.4752862 0.7356361 0.3714327
## 02_95_13__s5 0.4048365 0.4044021 0.8277072 0.4149942
## 06_61_12__s1 0.3219022 0.6893614 0.6386152 0.4432955
## 03_14_11__s5 0.3547483 0.3517207 0.6917478 0.5022100
## 05_46_81__s4 0.2805570 0.7038231 0.6858806 0.4250196
## Crebrf
## 05_87_62__s5 0.6789517
## 02_84_27__s3 0.5719899
## 02_95_13__s5 0.7185131
## 06_61_12__s1 0.5646369
## 03_14_11__s5 0.7801271
## 05_46_81__s4 0.6450653For more control over the computation pipeline, you can use
AUCProcessor class directly.
If your method outputs a quality metric for the inferred GRN, you can filter out unreliable TF-target relationships with the arguments:
qc_type: “adj_r2” or name of the column in
weights to use (pval, padj, etc.). If ‘qc_type’ == “adj_r2”
then cells above qc_threshold will be kept
(e.g. R² > 0.7). If ‘qc_type’ != “adj_r2” (we assume is a pvalue)
then cells below qc_threshold will be kept
(e.g. adj_p_val < 0.05).qc_threshold: desired threshold as described
earlier.Tip: If your method output is different you should modify your input column names accordingly.
# Initialize processor with weights and expression
AS_processor <- AUCProcessor(
weights = weights_df,
expression = expression_mat_path, # or matrix/data.frame
cell_labels = cell_labels,
qc_type = "adj_p_val",
qc_threshold = 0.05, # Filter targets above this threshold
n_cores = 2,
backend = "multisession"
)
## `weights` provided as a data.frame
## No adj_p_val column found in weights data.frame;
## skipping filtering.
## Converting expression matrix to sparse format...
## Cell labels file contains extra columns
## cell / category / label
## Transposing expression matrix to
## genes x cells format based on rownames matching cell IDs
# Compute with custom parameters
AS_processor <- compute_auc(
AS_processor,
sort_by = "expression",
select_top_k = NULL, # Use all targets (or limit to top K)
percent_of_target = 1.0, # Use all targets (or subset %)
verbose = TRUE
)
## Starting AUC computation...
## Sorting by: expression
## Targets: 100% of available
## Labels: 4 | Cells: 3000
## Backend requested: multisession | backend used: multisession |
## workers: 2
## Note: multisession (SOCK) has worker
## startup/serialization overhead;
## best for larger datasets or Windows compatibility.
## | | | 0%
## Loading required namespace: Matrix
## | |============ | 25%
## Loading required namespace: Matrix
## | |========================= | 50%
## Warning in value[[3L]](cond): Parallel processing failed: BiocParallel errors
## 2 remote errors, element index: 1, 2
## 2 unevaluated and other errors
## first remote error:
## Error in .cell_to_label[[cell_id]]: attempt to select less than one element in get1index
## .
## Falling back to sequential processing...
## AUC computation completed in 13.52 seconds!
## Result: 3000 cells x 10 TFs
## Tip: if runtime is dominated by setup overhead,
## try fewer workers (e.g., 2-4)
## or use backend='multicore' on Linux/macOS.# Extract results in wide format (default)
auc_wide <- get_auc(AS_processor, format = "wide")
head(auc_wide)
## Pms1 Prrx2 Ets1 Tet2
## 05_87_62__s5 0.3060804 0.6262551 0.7056634 0.4593377
## 02_84_27__s3 0.4154967 0.4810999 0.7371486 0.3861232
## 02_95_13__s5 0.4073043 0.4084424 0.8238244 0.4478523
## 06_61_12__s1 0.3231438 0.6866506 0.6451786 0.4501554
## 03_14_11__s5 0.3538671 0.3487276 0.6953040 0.5092492
## 05_46_81__s4 0.2835594 0.7021071 0.6916303 0.4362708
## Crebrf Gli3 Zhx2 A630089N07Rik
## 05_87_62__s5 0.7032736 0.4181954 0.5064481 0.2561313
## 02_84_27__s3 0.6067590 0.4475846 0.4105804 0.3368732
## 02_95_13__s5 0.7186970 0.4112379 0.2822939 0.3475682
## 06_61_12__s1 0.5973786 0.5095346 0.5390834 0.2747720
## 03_14_11__s5 0.7901458 0.3866339 0.4254945 0.2763368
## 05_46_81__s4 0.6747605 0.4176727 0.5895686 0.2582468
## Zfhx4 Pbx3
## 05_87_62__s5 0.5014684 0.2797076
## 02_84_27__s3 0.4881281 0.3368844
## 02_95_13__s5 0.4332679 0.3890446
## 06_61_12__s1 0.4961954 0.3004046
## 03_14_11__s5 0.5160187 0.2991336
## 05_46_81__s4 0.5088153 0.2843953# Extract results long format
auc_long <- get_auc(AS_processor, format = "long") # Long format
head(auc_long)
## cell tf score label
## 1 05_87_62__s5 Pms1 0.3060804 0
## 2 02_84_27__s3 Pms1 0.4154967 0
## 3 02_95_13__s5 Pms1 0.4073043 0
## 4 06_61_12__s1 Pms1 0.3231438 0
## 5 03_14_11__s5 Pms1 0.3538671 0
## 6 05_46_81__s4 Pms1 0.2835594 0Alternatively you can filter in advance your weights and use
SimiCvizExperiment and the wrapper function
calculate_activity_scores
weights_df <- read.csv("path/to/your/weights.csv")
weights_df_filtered <- weights_df[weights_df$p_value < 0.01,]
# From CSV files
viz_obj <- SimiCvizExperiment(
weights = weights_df_filtered,
auc = NULL, # Will compute this inthe next section later but can be loaded as well
cell_labels = cell_labels,
label_names = c("control","PD-L1","DAC","Combination"),
colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"),
meta=list() # Anything you want to store in a list format
)
viz_obj
viz_obj <- calculate_activity_scores(
viz_obj,
expression = expression_mat_path,
adj_r2_threshold = 0.7, # For SimiC style outputs
sort_by="expression", # Rank targets by expression or weight
select_top_k = NULL, # Use all targets (or limit to top K)
percent_of_target = 1.0,
n_cores = 2,
backend = "multisession",
verbose = TRUE
)Different GRN methods provide different quality metrics. Adjust filtering based on your method:
# SimiC: Filter by adjusted R² (goodness of fit)
processor_simic <- AUCProcessor(
weights = simic_weights,
expression = expr_mat,
cell_labels = cell_labels,
adj_r2_list = adjusted_r_squared, # a list length as simic_weights
qc_type = "adj_r2",
qc_threshold = 0.7 # Keep targets with R² ≥ 0.7
)
# SCENIC / Pando: Filter by adjusted p-value
processor_scenic <- AUCProcessor(
weights = weights_df,
expression = expr_mat,
cell_labels = cell_labels,
qc_type = "p_value",
qc_threshold = 0.05, # Keep targets with adj_p_val ≤ 0.05
n_cores = 4,
backend = "multisession"
)
# Compute with the same data, different parameters
processor_scenic <- compute_auc(processor_scenic, sort_by = "weight")Once you have computed the activity scores, you are ready to visualize the results!
For visualization we need a SimiCvizExperiment object
loaded with the results.
If you used load_SimiCPipeline or computed the activity
scores with calculate_activity_scores you already have
it.
# Recall above examples
simic_full # Complete SimiCpipeline output
## An object of class SimiCvizExperiment
## 4 label(s), 10 TF(s), 150 target(s)
## Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
## AUC: collected (3000 cells x 10 TFs)
## Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
## Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combo
## Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
## TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
## Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
## Meta keys: adjusted_r_squared, run_name, lambda1, lambda2
viz_obj_simic # SimiCPipeline weights -> `calculate_activity_scores`
## An object of class SimiCvizExperiment
## 4 label(s), 10 TF(s), 150 target(s)
## Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
## AUC: collected (3000 cells x 10 TFs)
## Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
## Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
## Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
## TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
## Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
## Meta keys: adjusted_r_squaredOtherwise you can load your results in
SimiCvizExperiment as exemplified above including the
activity scores results in the auc slot.
Note: Although activity score matrix is not
strictly necessary, you need it to fully explore all the visualization
tools of SimiCviz.
If you used other methods to calculate cell-specific activity scores with a different algorithm, you can import those as well in:
# Create SimiCvizExperiment
simic <- SimiCvizExperiment(weights = simic_weights,
auc = auc_wide,
cell_labels = cell_labels,
label_names = c("control","PD-L1","DAC","Combination"),
colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"))
## `weights` provided as a list.
## Cell labels file contains extra columns
## cell / category / label
## AUC in wide format (cells x TFs).
simic
## An object of class SimiCvizExperiment
## 4 label(s), 10 TF(s), 150 target(s)
## Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
## AUC: collected (3000 cells x 10 TFs)
## Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
## Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
## Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
## TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
## Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...After you created a SimiCvizExperiment object, we can
use the plotting functions to visualize the results.
If you want to save your plots in pdf you can define figure output
directory out_dir = plot_dir, change the
filename and set the argument save = TRUE
To assess the goodness of SimiC’s fitted model of target expression we can plot the distribution of adjusted R² values across all targets.
# Plot distribution of adjusted R² values across targets
# Extract Adjusted R squared from SimiC outputs
out <- read_pickle(weights_file)
adjusted_r_squared <- out$adjusted_r_squared
plot_r2_distribution(adjusted_r_squared, simic, grid = c(2, 2),
save = FALSE, out_dir = plot_dir)
Here we can see that most of the R² values are over 0.7, indicating that
the model explains a significant portion of the variance in the target
gene expression.
Lower R² might indicate:
For these targets with low adjusted R2, we suggest to filter them out before plotting. We select a threshold of 0.7 as it was the same used to calculate the activity scores in SimiCPipeline and above. You can change this threshold as needed, but you should re-run the activity score calculation (or SimiCPipeline) with the same threshold to be consistent.
If you already filtered your weight matrix before creating the SimiCviz object you can skip this part
# Select targets by adjusted R2
unselected_targets <- list()
selected_targets <- list()
lab_keys <- names(simic@label_names)
for (lab in lab_keys){
# Save selected for plotting
selected_targets[[lab]] <- simic@target_ids[which(adjusted_r_squared[[lab]] >= 0.7)]
# Save unselected for reporting
label <- simic@label_names[[lab]]
unselected_targets[[label]] <- simic@target_ids[which(adjusted_r_squared[[lab]] < 0.7)]
}
print("Number of unselected targets per label:")
## [1] "Number of unselected targets per label:"
print(sapply(unselected_targets, length))
## control PD-L1 DAC Combination
## 13 14 14 22We will plot the top 30 targets for the top 4 TFs. You can change the
number of TFs and targets to plot by changing the tf_names
and top_n arguments.
plot_tf_weights(
simic,
tf_names = simic@tf_ids[1:4],
top_n = 25,
allowed_targets = selected_targets, # Filter by R² if desired
grid = c(2, 2),
save = FALSE,
out_dir = plot_dir,
filename = "TF_weights_barplot.pdf"
)
## Plotting 4 TF weight barplot(s)...Now we will plot the regulators of each target. This plot shows only the TFs with non-zero weights.
plot_target_weights(
simic,
target_names = simic@target_ids[1:4],
labels = c("control", "Combination"),
grid = c(2, 2),
save = FALSE,
out_dir = plot_dir,
filename = "Target_weights_barplot.pdf"
)
## Plotting 4 target weight barplot(s)...Note: If you want to plot all TFs and Targets
leave the tf_names and target_names arguments respectively
blank. *Note:: You can also save the output and
acces individual plots
all_tfs_barplots <- plot_tf_weights(
simic,
top_n = 25,
grid = NULL,
allowed_targets = selected_targets)You can extract a TF regulon with the function
get_tf_network and plot it as a heatmap.
network <- get_tf_network(simic, "Tet2", r2_threshold = 0.7)
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'control';
## skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'PD-L1';
## skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'DAC';
## skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'Combination';
## skipping R² filtering.
print(head(network))
## control PD-L1 DAC Combination
## Slit2 -1.365095 0 0 0
## Kif20b 0.000000 0 0 0
## Ect2 0.000000 0 0 0
## Lhfpl2 0.000000 0 0 0
## Fn1 0.000000 0 0 0
## Kif15 0.000000 0 0 0
plot_tf_network_heatmap(simic, "Tet2",
save = FALSE,
top_n = 15,
r2_threshold = 0.7,
show_values = TRUE,
cmap = c("purple","white","yellow"))
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'control';
## skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'PD-L1';
## skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'DAC';
## skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'Combination';
## skipping R² filtering.
# Dissimilarity Analysis
To prioritize those TFs of interest, that might be driving the regulatory differences between our groups we perform a dissimilarity analysis by comparing per-cell TF activity-score distributions across phenotype labels using the minmax version of total variation distance for multiple distributions. This metric has values between 0 and 1, with values closer to 0 when the group of distributions is more similar to each other. Consequently, higher scores indicate greater regulatory dissimilarity between conditions — i.e., TFs whose activity differs most across phenotypes.
dis_score <- calculate_dissimilarity(simic)
## Calculating dissimilarity scores (all cells)...
## Top 10 TFs by MinMax dissimilarity:
top_tfs <- rownames(dis_score)
## MinMax_score
## Ets1 0.6646667
## Zhx2 0.5563333
## Prrx2 0.4606667
## Zfhx4 0.4380000
## Crebrf 0.3843333
## Gli3 0.3803333
## A630089N07Rik 0.3636667
## Pbx3 0.3490000
## Pms1 0.3286667
## Tet2 0.3246667We can visualize the top TFs ranked by dissimilarity score in a heatmap fashion.
Sometimes these transcriptional dynamics are cell-type or cell-cluster
specific. In this case, we may want to calculate the dissimilarity
scores across the labels within specific cell clusters. For that, we can
subset the
SimiCvizExperiment object by the desired labels
and then calculate the dissimilarity scores and plot the heatmap as
shown before.
metadata <- read.csv(system.file("extdata/metadata.csv",
package = "SimiCviz"))
# Build cell groups from metadata (e.g. Seurat clusters, cell types, etc.)
cell_groups <- lapply(unique(metadata$final_annotation_functional),
function(celltype) {
cell_labels$cell[metadata$final_annotation_functional == celltype]
})
names(cell_groups) <- unique(metadata$final_annotation_functional)
dissim_grouped <- calculate_dissimilarity(simic, labels = c(0,2),
cell_groups = cell_groups)
## Calculating dissimilarity for selected cell groups...
## Group: Basal-like (1025 cells)
## Group: Proliferating cells (475 cells)
## Top 10 TFs by mean dissimilarity across groups:
## Basal.like Proliferating.cells mean_score
## Ets1 0.9457499 0.9931507 0.9694503
## Zhx2 0.7915598 0.9513998 0.8714798
## Zfhx4 0.6992714 0.9398159 0.8195436
## Pbx3 0.5925271 0.4757654 0.5341463
## Prrx2 0.3703242 0.5868703 0.4785972
## Tet2 0.2503909 0.6692118 0.4598013
## Pms1 0.3205294 0.5616625 0.4410960
## A630089N07Rik 0.3232134 0.4706565 0.3969350
## Crebrf 0.2656747 0.4200165 0.3428456
## Gli3 0.2869234 0.2879145 0.2874190
# For all labels
plot_dissimilarity_heatmap(simic,
cell_groups = cell_groups,
top_n = 5,
cmap=c("magma"),
save = FALSE, out_dir = plot_dir,
filename = "dissimilarity_heatmap_grouped.pdf")We can also select the labels across which we want to compute the
dissimilarity score with the argument labels and customize
the color and sorting options.
# For labels 0,2
plot_dissimilarity_heatmap(simic,
labels = c(0,2),
cell_groups = cell_groups,
top_n = 5,
sort_by = "Proliferating.cells",
cmap=c("red", "white", "blue"),
save = FALSE)Visualize TF activity distributions across conditions. These plots allow you to visually assess the differences in TF activity distributions across conditions, and can help identify TFs with distinct regulatory patterns that may be driving phenotypic differences.
This function allows for customization of the plot aesthetics, including fill, transparency, bandwidth adjustment for density estimation, and rug plots to show individual data points. You can also choose to save the plots directly from the function.
# Plot distributions for top TFs
plot_auc_distributions(
simic,
tf_names = top_tfs[1:4],
fill = TRUE,
alpha = 0.6,
bw_adjust = 1/8,
rug = TRUE,
save = FALSE,
out_dir = plot_dir,
filename = "AUC_distributions.pdf",
grid = c(2, 2)
)# Plot top 4 TFs density distributions
plot_auc_distributions(simic,
labels = c(0,3),
tf_names = top_tfs[1:2],
fill = FALSE,
bw_adjust = 0.5,
rug = FALSE,
out_dir = plot_dir,
filename="AUC_distributions_notfilled_multipage.pdf",
save = FALSE,
grid = c(1,2))plot_auc_cumulative(
simic,
tf_names = top_tfs[1:4],
rug = TRUE,
grid = c(2, 2),
include_table = TRUE,
save = FALSE,
out_dir = plot_dir
)Compute area-under-curve metrics from ECDF comparisons:
ecdf_metrics <- calculate_ecdf_auc(simic, tf_names = simic@tf_ids[1:4])
head(ecdf_metrics)
## control_ecdf_auc control_auc50 control_x_at_p50
## Pms1 0.6416107 0.01371886 0.3253128
## Prrx2 0.4502471 0.05767541 0.5724188
## Ets1 0.3197519 0.01512182 0.6831469
## Tet2 0.5601535 0.02375632 0.4458104
## PD-L1_ecdf_auc PD-L1_auc50 PD-L1_x_at_p50
## Pms1 0.6440509 0.02219439 0.3203724
## Prrx2 0.5110546 0.06061229 0.4854002
## Ets1 0.5137333 0.02724930 0.4891511
## Tet2 0.5631466 0.01484295 0.4310090
## DAC_ecdf_auc DAC_auc50 DAC_x_at_p50
## Pms1 0.6760429 0.018881449 0.3143622
## Prrx2 0.5826345 0.046964519 0.3969131
## Ets1 0.4887809 0.019558184 0.5145263
## Tet2 0.5631231 0.009291574 0.4350603
## Combination_ecdf_auc Combination_auc50
## Pms1 0.6074249 0.02924377
## Prrx2 0.7298222 0.03988444
## Ets1 0.5687794 0.01663763
## Tet2 0.5801843 0.01289461
## Combination_x_at_p50 delta_ecdf_auc delta_auc50
## Pms1 0.3967417 0.06861800 0.01552491
## Prrx2 0.1866667 0.27957513 0.02072785
## Ets1 0.4234933 0.24902751 0.01212747
## Tet2 0.4020039 0.02003073 0.01446474
## delta_x_at_p50
## Pms1 0.08237952
## Prrx2 0.38575212
## Ets1 0.25965358
## Tet2 0.04380659sessionInfo()
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets
## [6] methods base
##
## other attached packages:
## [1] SimiCviz_0.99.0 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.4 sass_0.4.10
## [3] generics_0.1.4 tidyr_1.3.2
## [5] stringi_1.8.7 lattice_0.22-9
## [7] digest_0.6.39 magrittr_2.0.5
## [9] evaluate_1.0.5 grid_4.6.0
## [11] RColorBrewer_1.1-3 fastmap_1.2.0
## [13] plyr_1.8.9 rprojroot_2.1.1
## [15] jsonlite_2.0.0 Matrix_1.7-5
## [17] gridExtra_2.3 BiocManager_1.30.27
## [19] purrr_1.2.2 viridisLite_0.4.3
## [21] scales_1.4.0 codetools_0.2-20
## [23] jquerylib_0.1.4 cli_3.6.6
## [25] rlang_1.2.0 withr_3.0.2
## [27] cachem_1.1.0 yaml_2.3.12
## [29] otel_0.2.0 tools_4.6.0
## [31] parallel_4.6.0 reshape2_1.4.5
## [33] BiocParallel_1.47.0 dplyr_1.2.1
## [35] colorspace_2.1-2 ggplot2_4.0.3
## [37] here_1.0.2 reticulate_1.46.0
## [39] buildtools_1.0.0 vctrs_0.7.3
## [41] R6_2.6.1 png_0.1-9
## [43] lifecycle_1.0.5 stringr_1.6.0
## [45] pkgconfig_2.0.3 pillar_1.11.1
## [47] bslib_0.11.0 gtable_0.3.6
## [49] glue_1.8.1 Rcpp_1.1.1-1.1
## [51] xfun_0.58 tibble_3.3.1
## [53] tidyselect_1.2.1 sys_3.4.3
## [55] knitr_1.51 farver_2.1.2
## [57] snow_0.4-4 htmltools_0.5.9
## [59] labeling_0.4.3 rmarkdown_2.31
## [61] maketools_1.3.2 compiler_4.6.0
## [63] S7_0.2.2