SimiCviz: Visualization and Analysis of Gene Regulatory Networks

Overview

SimiCviz provides various utilities for gene regulatory network (GRN) analysis for single-cell RNA-seq data. Originally developed to visualize the phenotype-specific GRN output from SimiCPipeline for R users, it also includes tools for:

  • Load and organize GRN outputs from multiple sources (SimiCPipeline, SCENIC, Pando, etc.)
  • Calculate cell activity scores (AS) using expression data and GRN weights.
  • Visualize regulatory networks, distribution of TF/regulon activities, and dissimilarity across conditions.

The package is interoperable and agnostic to the GRN inference methods, accepting different GRN outputs.

Loading GRN Data

Data Format Requirements

SimiCviz easily handles SimiCPipeline outputs but also accepts GRN weights in a flexible long format. The input data.frame must include the following columns:

  • tf: Transcription factor name
  • target: Target gene name
  • weight: Edge weight/coefficient
  • label (optional): Phenotype/condition identifier for label-specific networks

Optional quality metrics for filtering weights:

  • pvalue / adj_p_val: Statistical significance (for methods like SCENIC, Pando)
  • r_squared / adj_r_squared: Goodness of fit (for methods like SimiC)

SimiCPipeline Outputs

SimiCPipeline is a regularized regression method for inferring condition-specific GRNs. It generates:

  • .pickle files with TF-target weights per phenotype/label and adjusted R² values for goodness of fit of the target expression model.
  • .csv file with a TF activity score matrix (cells × TFs)
  • .pickle files with per-label matrices (cells × TFs) [Legacy SimiC v1 - Deprecated]

These outputs are organized in a directory structure like this:

Project/
├── inputFiles/
│   ├── TF_list.csv
│   ├── expression_matrix.pickle
│   └── phenotype_annotation.txt
└── outputSimic/
    ├── figures/
    └── matrices/
        └── example1/
            ├── example1_L1_0.1_L2_0.01_simic_matrices.pickle
            ├── example1_L1_0.1_L2_0.01_simic_matrices_filtered_BIC.pickle
            ├── example1_L1_0.1_L2_0.01_wAUC_matrices_filtered_BIC.pickle
            └── example1_L1_0.1_L2_0.01_wAUC_matrices_filtered_BIC_collected.csv

Here we show how to easily load a SimiCPipeline run. You only need the directory, experiment name, and hyperparameters since the directory output is standardized. SimiCviz will automatically find the files and generate the SimiCvizExperiment.

library(SimiCviz)

# Load entire SimiCPipeline run automatically
simic_full <- load_SimiCPipeline(
  project_dir = "path/to/simic_run",
  run_name = "example1",
  lambda1 = "0.01",
  lambda2 = "0.001"
)

# Set display names and colors for visualization (Part 3)
simic_full <- setLabelNames(
  simic_full, 
  label_names = c("control", "PD-L1", "DAC", "Combination"),
  colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0")
)
simic_full
## An object of class SimiCvizExperiment
##  4 label(s), 10 TF(s), 150 target(s)
##  Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
##  AUC: collected (3000 cells x 10 TFs)
##  Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
##  Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combo
##  Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
##  TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
##  Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
##  Meta keys: adjusted_r_squared, run_name, lambda1, lambda2

If you run the full SimiCPipeline tutorial you can skip Part 2 and go straight to Part 3 for visualizations.

Manual Loading

If you need more flexibility in the workflow, or have precomputed outputs using SimiC v1, you can construct SimiCvizExperiment or AUCProcessor objects by loading the required files separately.

Weights from pickle


# Load weights from pickle
weights_file <- system.file("extdata", 
  file.path("outputSimic/example1_simic_weights.pickle"), 
  package = "SimiCviz")

simic_weights <- read_weights_pickle(weights_file)
## Downloading uv...Done!
simic_weights[[1]][, 1:6]
##                    Slit2   Kif20b     Ect2 Lhfpl2
## Pms1           0.0000000 0.000000 0.000000      0
## Prrx2          1.2348210 0.000000 0.000000      0
## Ets1           0.0000000 0.000000 0.000000      0
## Tet2          -1.3650950 0.000000 0.000000      0
## Crebrf         0.0000000 0.000000 0.000000      0
## Gli3           0.0000000 0.000000 0.000000      0
## Zhx2           0.0000000 0.000000 0.000000      0
## A630089N07Rik  0.0000000 1.360139 1.231132      0
## Zfhx4         -0.9495542 0.000000 0.000000      0
## Pbx3           0.0000000 2.344418 2.438383      0
##                      Fn1    Kif15
## Pms1          -0.8740381 0.000000
## Prrx2          0.0000000 0.000000
## Ets1           0.0000000 0.000000
## Tet2           0.0000000 0.000000
## Crebrf         0.0000000 0.000000
## Gli3          -0.9873656 0.000000
## Zhx2           0.0000000 0.000000
## A630089N07Rik -0.8189908 1.875401
## Zfhx4          0.0000000 0.000000
## Pbx3          -2.3883315 2.776151

Weights from other methods

Most single-cell GRN inference methods produce table outputs with TF-target weights/estimates and p-values.

Example: Generic CSV Format

# Load GRN weights (method agnostic)

weight_path <- system.file("extdata", "example_weights.csv", 
                           package = "SimiCviz")

# Read as data.frame
weights_df <- read_weights_csv(weight_path)
head(weights_df)
##       tf target    weight label
## 1   Pms1  Slit2  0.000000     0
## 2  Prrx2  Slit2  1.234821     0
## 3   Ets1  Slit2  0.000000     0
## 4   Tet2  Slit2 -1.365095     0
## 5 Crebrf  Slit2  0.000000     0
## 6   Gli3  Slit2  0.000000     0

# If your method uses different column names you need to rename them

Expected columns in CSV:

# Minimal
tf, target, weight
# If only one GRN a column name "label" will be all values in label should be 0

# With quality metrics
tf, target, weight, pvalue, adj_p_val

# With phenotype labels
tf, target, weight, label, pvalue, adj_p_val

Cell Labels / Phenotype Annotations

The cell_labels map each cell ID in your expression matrix or activity scores matrix to a condition/phenotype. This is required for activity score calculation and AUC visualizations.

# Load from CSV (recommended format: columns 'cell', 'label')
cell_labels_path <- system.file("extdata",
  file.path("inputFiles", "treatment_annotation.csv"), 
  package = "SimiCviz")

cell_labels <- load_cell_labels(cell_labels_path, header = TRUE, sep = ",")
## Cell labels file contains extra columns
## Cell / category / label
head(cell_labels)
##           cell category label
## 1 05_87_62__s5  control     0
## 2 02_84_27__s3  control     0
## 3 02_95_13__s5  control     0
## 4 06_61_12__s1  control     0
## 5 03_14_11__s5  control     0
## 6 05_46_81__s4  control     0

The function load_cell_labels can take alternative formats/files:

# From vector (will generate cell_1, cell_2, ... names)
cell_labels <- c(0, 0, 1, 1, 2, 2)  # Must match order of AUC rows

# From named vector
cell_labels <- c(cell_A = 0, cell_B = 0, cell_C = 1)

# From data.frame
cell_labels <- data.frame(cell = c("cell_A", "cell_B"), label = c(0, 1))

Expression Matrix

Only required for computing activity scores. Can be:

  • CSV file
  • Pickle file: numpy array or pandas DataFrame
  • H5AD file: AnnData object
  • RDS file: Seurat or SingleCellExperiment object
  • R object: matrix or data.frame

Important: Final expression matrix must be in genes × cells format (genes as rows, cells as columns).

# Load from multiple formats
expression_mat_path <- system.file("extdata",
  file.path("inputFiles", "example1_expression.pickle"),
  package = "SimiCviz")

# Will auto-detect format and load
expression_mat <- load_expression_matrix(expression_mat_path)
## Converting expression matrix to sparse format...
print(class(expression_mat))
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"
print(dim(expression_mat))
## [1] 3000  160

Create a SimiCvizExperiment Object

The SimiCvizExperiment container organizes weights, activity scores, and metadata.

# From .pickle SimiC files

# Extract Adjusted R squared from SimiC outputs
out <- read_pickle(weights_file)
adjusted_r_squared <- out$adjusted_r_squared

viz_obj_simic <- SimiCvizExperiment(
  weights = simic_weights,
  auc = NULL,  # Will compute this in the next section later but can be loaded as well
  cell_labels = cell_labels,
  label_names = c("control","PD-L1","DAC","Combination"),
  colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"),
  meta=list(adjusted_r_squared=adjusted_r_squared))
## `weights` provided as a list.
## Cell labels file contains extra columns
## cell / category / label

viz_obj_simic
## An object of class SimiCvizExperiment
##  4 label(s), 10 TF(s), 150 target(s)
##  Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
##  AUC collected: none
##  Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
##  Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
##  Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
##  TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
##  Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
##  Meta keys: adjusted_r_squared

Computing Activity Scores (AUC)

In many biological contexts, we aim to quantify changes in GRN activity across cell populations. This is achieved by computing an activity score for each TF in each cell, as described in the original SimiC article by Peng et al published in Communications Biology (2022). Here, we provide wrapper functions to apply this approach to outputs from SimiCPipeline or other GRN inference methods.

Quick Start

Compute activity scores from TF-Target weights and expression matrix using SimiCvizExperiment

# From SimiC input files
# adj_r2_threshold: For SimiCPipeline style outputs it will look in metadata of viz_obj_simic, for "simic@meta$adjusted_r_squared" and filter out targets with lower R²)
# n_cores: Number of workers if backend = 'multissession' or 
#          Number of cores if backend  = 'multicore'

viz_obj_simic <- calculate_activity_scores(
              viz_obj_simic,
              expression = expression_mat_path,
              adj_r2_threshold = 0.7, # For SimiC style outputs 
              sort_by="expression", # Rank targets by expression or weight
              select_top_k = NULL,  # Use all targets (or limit to top K)
              percent_of_target = 1.0,  
              n_cores = 2,
              backend = "multicore",
              verbose = TRUE
            )
## Initializing AUCProcessor...
## `weights` provided as a list, converting to list data.frame
## Filtering weights by adjusted R2 threshold...
## Converting expression matrix to sparse format...
## Cell labels file contains extra columns
## cell / category / label
## Transposing expression matrix to 
##                 genes x cells format based on rownames matching cell IDs
## Subsetting expression matrix to genes in GRNs. 
##             Number of tfs: 10 and Number of targets: 148
## Computing activity scores...
## Starting AUC computation...
##   Sorting by: expression
##   Targets: 100% of available
##   Labels: 4 | Cells: 3000
##   Backend requested: multicore | backend used: multicore | 
##                          workers: 2
##   |                                                          |                                                  |   0%  |                                                          |======                                            |  12%  |                                                          |============                                      |  25%  |                                                          |===================                               |  38%  |                                                          |=========================                         |  50%  |                                                          |===============================                   |  62%  |                                                          |======================================            |  75%  |                                                          |============================================      |  88%  |                                                          |==================================================| 100%
## AUC computation completed in 5.74 seconds!
##   Result: 3000 cells x 10 TFs
## Activity scores computed and added to simic object


# Access computed scores
auc_scores <- viz_obj_simic@auc$collected
head(auc_scores[, 1:5])
##                   Pms1     Prrx2      Ets1      Tet2
## 05_87_62__s5 0.3017963 0.6271518 0.7016742 0.4475671
## 02_84_27__s3 0.4113587 0.4752862 0.7356361 0.3714327
## 02_95_13__s5 0.4048365 0.4044021 0.8277072 0.4149942
## 06_61_12__s1 0.3219022 0.6893614 0.6386152 0.4432955
## 03_14_11__s5 0.3547483 0.3517207 0.6917478 0.5022100
## 05_46_81__s4 0.2805570 0.7038231 0.6858806 0.4250196
##                 Crebrf
## 05_87_62__s5 0.6789517
## 02_84_27__s3 0.5719899
## 02_95_13__s5 0.7185131
## 06_61_12__s1 0.5646369
## 03_14_11__s5 0.7801271
## 05_46_81__s4 0.6450653

Advanced: AUCProcessor Workflow

For more control over the computation pipeline, you can use AUCProcessor class directly.

If your method outputs a quality metric for the inferred GRN, you can filter out unreliable TF-target relationships with the arguments:

  • qc_type: “adj_r2” or name of the column in weights to use (pval, padj, etc.). If ‘qc_type’ == “adj_r2” then cells above qc_threshold will be kept (e.g. R² > 0.7). If ‘qc_type’ != “adj_r2” (we assume is a pvalue) then cells below qc_threshold will be kept (e.g. adj_p_val < 0.05).
  • qc_threshold: desired threshold as described earlier.

Tip: If your method output is different you should modify your input column names accordingly.

# Initialize processor with weights and expression
AS_processor <- AUCProcessor(
  weights = weights_df,
  expression = expression_mat_path,  # or matrix/data.frame
  cell_labels = cell_labels,
  qc_type = "adj_p_val",
  qc_threshold = 0.05,  # Filter targets above this threshold
  n_cores = 2,
  backend = "multisession"
)
## `weights` provided as a data.frame
## No adj_p_val column found in weights data.frame; 
##                           skipping filtering.
## Converting expression matrix to sparse format...
## Cell labels file contains extra columns
## cell / category / label
## Transposing expression matrix to 
##                 genes x cells format based on rownames matching cell IDs

# Compute with custom parameters
AS_processor <- compute_auc(
  AS_processor,
  sort_by = "expression",
  select_top_k = NULL,            # Use all targets (or limit to top K)
  percent_of_target = 1.0,        # Use all targets (or subset %)
  verbose = TRUE
)
## Starting AUC computation...
##   Sorting by: expression
##   Targets: 100% of available
##   Labels: 4 | Cells: 3000
##   Backend requested: multisession | backend used: multisession | 
##                          workers: 2
##   Note: multisession (SOCK) has worker 
##                    startup/serialization overhead;
##         best for larger datasets or Windows compatibility.
##   |                                                          |                                                  |   0%
## Loading required namespace: Matrix
##   |                                                          |============                                      |  25%
## Loading required namespace: Matrix
##   |                                                          |=========================                         |  50%
## Warning in value[[3L]](cond): Parallel processing failed: BiocParallel errors
##   2 remote errors, element index: 1, 2
##   2 unevaluated and other errors
##   first remote error:
## Error in .cell_to_label[[cell_id]]: attempt to select less than one element in get1index
## . 
##           Falling back to sequential processing...
## AUC computation completed in 13.52 seconds!
##   Result: 3000 cells x 10 TFs
##   Tip: if runtime is dominated by setup overhead, 
##                    try fewer workers (e.g., 2-4)
##        or use backend='multicore' on Linux/macOS.
# Extract results in wide format (default)
auc_wide <- get_auc(AS_processor, format = "wide")
head(auc_wide)
##                   Pms1     Prrx2      Ets1      Tet2
## 05_87_62__s5 0.3060804 0.6262551 0.7056634 0.4593377
## 02_84_27__s3 0.4154967 0.4810999 0.7371486 0.3861232
## 02_95_13__s5 0.4073043 0.4084424 0.8238244 0.4478523
## 06_61_12__s1 0.3231438 0.6866506 0.6451786 0.4501554
## 03_14_11__s5 0.3538671 0.3487276 0.6953040 0.5092492
## 05_46_81__s4 0.2835594 0.7021071 0.6916303 0.4362708
##                 Crebrf      Gli3      Zhx2 A630089N07Rik
## 05_87_62__s5 0.7032736 0.4181954 0.5064481     0.2561313
## 02_84_27__s3 0.6067590 0.4475846 0.4105804     0.3368732
## 02_95_13__s5 0.7186970 0.4112379 0.2822939     0.3475682
## 06_61_12__s1 0.5973786 0.5095346 0.5390834     0.2747720
## 03_14_11__s5 0.7901458 0.3866339 0.4254945     0.2763368
## 05_46_81__s4 0.6747605 0.4176727 0.5895686     0.2582468
##                  Zfhx4      Pbx3
## 05_87_62__s5 0.5014684 0.2797076
## 02_84_27__s3 0.4881281 0.3368844
## 02_95_13__s5 0.4332679 0.3890446
## 06_61_12__s1 0.4961954 0.3004046
## 03_14_11__s5 0.5160187 0.2991336
## 05_46_81__s4 0.5088153 0.2843953
# Extract results long format
auc_long <- get_auc(AS_processor, format = "long") # Long format
head(auc_long)
##           cell   tf     score label
## 1 05_87_62__s5 Pms1 0.3060804     0
## 2 02_84_27__s3 Pms1 0.4154967     0
## 3 02_95_13__s5 Pms1 0.4073043     0
## 4 06_61_12__s1 Pms1 0.3231438     0
## 5 03_14_11__s5 Pms1 0.3538671     0
## 6 05_46_81__s4 Pms1 0.2835594     0

Alternatively you can filter in advance your weights and use SimiCvizExperiment and the wrapper function calculate_activity_scores

weights_df <- read.csv("path/to/your/weights.csv")
weights_df_filtered <- weights_df[weights_df$p_value < 0.01,]
# From CSV files
viz_obj <- SimiCvizExperiment(
  weights = weights_df_filtered,
  auc = NULL,  # Will compute this inthe next section later but can be loaded as well
  cell_labels = cell_labels,
  label_names = c("control","PD-L1","DAC","Combination"),
  colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"),
  meta=list() # Anything you want to store in a list format
  )

viz_obj
viz_obj <- calculate_activity_scores(
              viz_obj,
              expression = expression_mat_path,
              adj_r2_threshold = 0.7, # For SimiC style outputs 
              sort_by="expression", # Rank targets by expression or weight
              select_top_k = NULL,  # Use all targets (or limit to top K)
              percent_of_target = 1.0,  
              n_cores = 2,
              backend = "multisession",
              verbose = TRUE
            )

Filtering Options

Different GRN methods provide different quality metrics. Adjust filtering based on your method:

# SimiC: Filter by adjusted R² (goodness of fit)

processor_simic <- AUCProcessor(
  weights = simic_weights,
  expression = expr_mat,
  cell_labels = cell_labels,
  adj_r2_list = adjusted_r_squared, # a list length as simic_weights
  qc_type = "adj_r2",
  qc_threshold = 0.7  # Keep targets with R² ≥ 0.7
)

# SCENIC / Pando: Filter by adjusted p-value
processor_scenic <- AUCProcessor(
  weights = weights_df,
  expression = expr_mat,
  cell_labels = cell_labels,
  qc_type = "p_value",
  qc_threshold = 0.05,  # Keep targets with adj_p_val ≤ 0.05
  n_cores = 4,
  backend = "multisession"
  )

# Compute with the same data, different parameters
processor_scenic <- compute_auc(processor_scenic, sort_by = "weight")

Once you have computed the activity scores, you are ready to visualize the results!

Network Visualization

For visualization we need a SimiCvizExperiment object loaded with the results.

If you used load_SimiCPipeline or computed the activity scores with calculate_activity_scores you already have it.

# Recall above examples
 simic_full # Complete SimiCpipeline output
## An object of class SimiCvizExperiment
##  4 label(s), 10 TF(s), 150 target(s)
##  Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
##  AUC: collected (3000 cells x 10 TFs)
##  Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
##  Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combo
##  Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
##  TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
##  Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
##  Meta keys: adjusted_r_squared, run_name, lambda1, lambda2
 viz_obj_simic # SimiCPipeline weights -> `calculate_activity_scores`
## An object of class SimiCvizExperiment
##  4 label(s), 10 TF(s), 150 target(s)
##  Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
##  AUC: collected (3000 cells x 10 TFs)
##  Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
##  Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
##  Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
##  TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
##  Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...
##  Meta keys: adjusted_r_squared

Otherwise you can load your results in SimiCvizExperiment as exemplified above including the activity scores results in the auc slot.

Note: Although activity score matrix is not strictly necessary, you need it to fully explore all the visualization tools of SimiCviz.

If you used other methods to calculate cell-specific activity scores with a different algorithm, you can import those as well in:

  • wide format: cells x TF scores matrix
  • long format: data.frame with cell, tf, score
  • .pickle file (only for SimiC v1)
# Create SimiCvizExperiment
simic <- SimiCvizExperiment(weights = simic_weights,
                             auc = auc_wide,
                             cell_labels = cell_labels,
                             label_names = c("control","PD-L1","DAC","Combination"),
                             colors = c("#e0e0e0", "#a8c8ff", "#ffb6b6", "#c1a9e0"))
## `weights` provided as a list.
## Cell labels file contains extra columns
## cell / category / label
## AUC in wide format (cells x TFs).
simic
## An object of class SimiCvizExperiment
##  4 label(s), 10 TF(s), 150 target(s)
##  Weights: 4 matrices [0: 10 x 150, 1: 10 x 150, 2: 10 x 150, 3: 10 x 150]
##  AUC: collected (3000 cells x 10 TFs)
##  Cell labels: 3000 cells across 4 label(s) [0, 1, 2, 3]
##  Label names: 0 = control, 1 = PD-L1, 2 = DAC, 3 = Combination
##  Colors: 0 = #e0e0e0, 1 = #a8c8ff, 2 = #ffb6b6, 3 = #c1a9e0
##  TFs: Pms1, Prrx2, Ets1, Tet2, Crebrf, Gli3, ...
##  Targets: Slit2, Kif20b, Ect2, Lhfpl2, Fn1, Kif15, ...

After you created a SimiCvizExperiment object, we can use the plotting functions to visualize the results.

If you want to save your plots in pdf you can define figure output directory out_dir = plot_dir, change the filename and set the argument save = TRUE

plot_dir <- file.path(getwd(),"SimiCviz_output")
dir.create(plot_dir,recursive = TRUE)

Quality Assessment (SimiC only)

To assess the goodness of SimiC’s fitted model of target expression we can plot the distribution of adjusted R² values across all targets.

# Plot distribution of adjusted R² values across targets
# Extract Adjusted R squared from SimiC outputs
out <- read_pickle(weights_file)
adjusted_r_squared <- out$adjusted_r_squared
plot_r2_distribution(adjusted_r_squared, simic, grid = c(2, 2), 
                     save = FALSE, out_dir = plot_dir)

Here we can see that most of the R² values are over 0.7, indicating that the model explains a significant portion of the variance in the target gene expression.

Lower R² might indicate:

  • Noisy data
  • Missing regulatory interactions
  • Complex regulation not captured by linear model

For these targets with low adjusted R2, we suggest to filter them out before plotting. We select a threshold of 0.7 as it was the same used to calculate the activity scores in SimiCPipeline and above. You can change this threshold as needed, but you should re-run the activity score calculation (or SimiCPipeline) with the same threshold to be consistent.

If you already filtered your weight matrix before creating the SimiCviz object you can skip this part

# Select targets by adjusted R2
unselected_targets <- list()
selected_targets <- list()
lab_keys <- names(simic@label_names)
for (lab in lab_keys){
    # Save selected for plotting
    selected_targets[[lab]] <- simic@target_ids[which(adjusted_r_squared[[lab]] >= 0.7)]
    # Save unselected for reporting
    label <- simic@label_names[[lab]]
    unselected_targets[[label]] <- simic@target_ids[which(adjusted_r_squared[[lab]] < 0.7)]
}
print("Number of unselected targets per label:")
## [1] "Number of unselected targets per label:"
print(sapply(unselected_targets, length)) 
##     control       PD-L1         DAC Combination 
##          13          14          14          22

Weights Visualization

TF Barplots:

We will plot the top 30 targets for the top 4 TFs. You can change the number of TFs and targets to plot by changing the tf_names and top_n arguments.

plot_tf_weights(
  simic,
  tf_names = simic@tf_ids[1:4],
  top_n = 25,
  allowed_targets = selected_targets,  # Filter by R² if desired
  grid = c(2, 2),
  save = FALSE,
  out_dir = plot_dir,
  filename = "TF_weights_barplot.pdf"
)
## Plotting 4 TF weight barplot(s)...

Target Barplots:

Now we will plot the regulators of each target. This plot shows only the TFs with non-zero weights.

plot_target_weights(
  simic,
  target_names = simic@target_ids[1:4],
  labels = c("control", "Combination"),
  grid = c(2, 2),
  save = FALSE,
  out_dir = plot_dir,
  filename = "Target_weights_barplot.pdf"
)
## Plotting 4 target weight barplot(s)...

Note: If you want to plot all TFs and Targets leave the tf_names and target_names arguments respectively blank. *Note:: You can also save the output and acces individual plots

all_tfs_barplots <- plot_tf_weights(
                          simic,
                          top_n = 25, 
                          grid = NULL,
                          allowed_targets = selected_targets)
all_tfs_barplots[[1]]

Regulatory Network Heatmap

You can extract a TF regulon with the function get_tf_network and plot it as a heatmap.

network <- get_tf_network(simic, "Tet2", r2_threshold = 0.7)
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'control'; 
##                          skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'PD-L1'; 
##                          skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'DAC'; 
##                          skipping R² filtering.
## Warning in get_tf_network(simic, "Tet2", r2_threshold = 0.7): No adjusted R² values found for label 'Combination'; 
##                          skipping R² filtering.
print(head(network))
##          control PD-L1 DAC Combination
## Slit2  -1.365095     0   0           0
## Kif20b  0.000000     0   0           0
## Ect2    0.000000     0   0           0
## Lhfpl2  0.000000     0   0           0
## Fn1     0.000000     0   0           0
## Kif15   0.000000     0   0           0

plot_tf_network_heatmap(simic, "Tet2", 
                        save = FALSE, 
                        top_n = 15,
                        r2_threshold = 0.7,
                        show_values = TRUE, 
                        cmap = c("purple","white","yellow"))
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'control'; 
##                          skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'PD-L1'; 
##                          skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'DAC'; 
##                          skipping R² filtering.
## Warning in get_tf_network(x, tf_name = tf_name, labels = labels, r2_threshold = r2_threshold): No adjusted R² values found for label 'Combination'; 
##                          skipping R² filtering.

# Dissimilarity Analysis

To prioritize those TFs of interest, that might be driving the regulatory differences between our groups we perform a dissimilarity analysis by comparing per-cell TF activity-score distributions across phenotype labels using the minmax version of total variation distance for multiple distributions. This metric has values between 0 and 1, with values closer to 0 when the group of distributions is more similar to each other. Consequently, higher scores indicate greater regulatory dissimilarity between conditions — i.e., TFs whose activity differs most across phenotypes.

Global dissimilarity scores

dis_score <- calculate_dissimilarity(simic)
## Calculating dissimilarity scores (all cells)...
## Top 10 TFs by MinMax dissimilarity:
top_tfs <- rownames(dis_score)
##               MinMax_score
## Ets1             0.6646667
## Zhx2             0.5563333
## Prrx2            0.4606667
## Zfhx4            0.4380000
## Crebrf           0.3843333
## Gli3             0.3803333
## A630089N07Rik    0.3636667
## Pbx3             0.3490000
## Pms1             0.3286667
## Tet2             0.3246667

Dissimilarity heatmap

We can visualize the top TFs ranked by dissimilarity score in a heatmap fashion.

plot_dissimilarity_heatmap(simic, 
                           top_n = 5, 
                           cmap = "viridis",
                           save = FALSE)

Sometimes these transcriptional dynamics are cell-type or cell-cluster specific. In this case, we may want to calculate the dissimilarity scores across the labels within specific cell clusters. For that, we can subset the SimiCvizExperiment object by the desired labels and then calculate the dissimilarity scores and plot the heatmap as shown before.

metadata <- read.csv(system.file("extdata/metadata.csv", 
  package = "SimiCviz"))
 
# Build cell groups from metadata (e.g. Seurat clusters, cell types, etc.)
cell_groups  <- lapply(unique(metadata$final_annotation_functional), 
                       function(celltype) {
  cell_labels$cell[metadata$final_annotation_functional == celltype]
})
names(cell_groups) <- unique(metadata$final_annotation_functional)

dissim_grouped <- calculate_dissimilarity(simic, labels = c(0,2),
                                          cell_groups = cell_groups)
## Calculating dissimilarity for selected cell groups...
##   Group: Basal-like (1025 cells)
##   Group: Proliferating cells (475 cells)
## Top 10 TFs by mean dissimilarity across groups:
##               Basal.like Proliferating.cells mean_score
## Ets1           0.9457499           0.9931507  0.9694503
## Zhx2           0.7915598           0.9513998  0.8714798
## Zfhx4          0.6992714           0.9398159  0.8195436
## Pbx3           0.5925271           0.4757654  0.5341463
## Prrx2          0.3703242           0.5868703  0.4785972
## Tet2           0.2503909           0.6692118  0.4598013
## Pms1           0.3205294           0.5616625  0.4410960
## A630089N07Rik  0.3232134           0.4706565  0.3969350
## Crebrf         0.2656747           0.4200165  0.3428456
## Gli3           0.2869234           0.2879145  0.2874190

# For all labels
plot_dissimilarity_heatmap(simic,
                            cell_groups = cell_groups, 
                            top_n = 5,
                            cmap=c("magma"),
                            save = FALSE, out_dir = plot_dir,
                            filename = "dissimilarity_heatmap_grouped.pdf")

We can also select the labels across which we want to compute the dissimilarity score with the argument labels and customize the color and sorting options.

# For labels 0,2
plot_dissimilarity_heatmap(simic,
                            labels = c(0,2),
                            cell_groups = cell_groups, 
                            top_n = 5, 
                            sort_by = "Proliferating.cells",
                            cmap=c("red", "white", "blue"),
                            save = FALSE)

Activity Score Distributions

Visualize TF activity distributions across conditions. These plots allow you to visually assess the differences in TF activity distributions across conditions, and can help identify TFs with distinct regulatory patterns that may be driving phenotypic differences.

Density plots

This function allows for customization of the plot aesthetics, including fill, transparency, bandwidth adjustment for density estimation, and rug plots to show individual data points. You can also choose to save the plots directly from the function.

# Plot distributions for top TFs
plot_auc_distributions(
  simic,
  tf_names = top_tfs[1:4],
  fill = TRUE,
  alpha = 0.6,
  bw_adjust = 1/8,
  rug = TRUE,
  save = FALSE,
  out_dir = plot_dir,
  filename = "AUC_distributions.pdf",
  grid = c(2, 2)
)

# Plot top 4 TFs density distributions
plot_auc_distributions(simic,
                       labels = c(0,3),
                       tf_names = top_tfs[1:2],
                       fill = FALSE,
                       bw_adjust = 0.5,
                       rug = FALSE,
                       out_dir = plot_dir,
                       filename="AUC_distributions_notfilled_multipage.pdf",
                       save = FALSE,
                       grid = c(1,2))

Cumulative distributions (ECDF)

plot_auc_cumulative(
  simic,
  tf_names = top_tfs[1:4],
  rug = TRUE,
  grid = c(2, 2),
  include_table = TRUE,
  save = FALSE,
  out_dir = plot_dir
)

ECDF-based Metrics

Compute area-under-curve metrics from ECDF comparisons:

ecdf_metrics <- calculate_ecdf_auc(simic, tf_names = simic@tf_ids[1:4])
head(ecdf_metrics)
##       control_ecdf_auc control_auc50 control_x_at_p50
## Pms1         0.6416107    0.01371886        0.3253128
## Prrx2        0.4502471    0.05767541        0.5724188
## Ets1         0.3197519    0.01512182        0.6831469
## Tet2         0.5601535    0.02375632        0.4458104
##       PD-L1_ecdf_auc PD-L1_auc50 PD-L1_x_at_p50
## Pms1       0.6440509  0.02219439      0.3203724
## Prrx2      0.5110546  0.06061229      0.4854002
## Ets1       0.5137333  0.02724930      0.4891511
## Tet2       0.5631466  0.01484295      0.4310090
##       DAC_ecdf_auc   DAC_auc50 DAC_x_at_p50
## Pms1     0.6760429 0.018881449    0.3143622
## Prrx2    0.5826345 0.046964519    0.3969131
## Ets1     0.4887809 0.019558184    0.5145263
## Tet2     0.5631231 0.009291574    0.4350603
##       Combination_ecdf_auc Combination_auc50
## Pms1             0.6074249        0.02924377
## Prrx2            0.7298222        0.03988444
## Ets1             0.5687794        0.01663763
## Tet2             0.5801843        0.01289461
##       Combination_x_at_p50 delta_ecdf_auc delta_auc50
## Pms1             0.3967417     0.06861800  0.01552491
## Prrx2            0.1866667     0.27957513  0.02072785
## Ets1             0.4234933     0.24902751  0.01212747
## Tet2             0.4020039     0.02003073  0.01446474
##       delta_x_at_p50
## Pms1      0.08237952
## Prrx2     0.38575212
## Ets1      0.25965358
## Tet2      0.04380659

Summary Statistics

Mean activity per TF × phenotype

plot_auc_heatmap(simic, top_n = 20)

Box plots and violin plots

summary_plot <- plot_auc_summary_statistics(simic)

Session Info

sessionInfo()
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## other attached packages:
## [1] SimiCviz_0.99.0  BiocStyle_2.41.0
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.4      sass_0.4.10        
##  [3] generics_0.1.4      tidyr_1.3.2        
##  [5] stringi_1.8.7       lattice_0.22-9     
##  [7] digest_0.6.39       magrittr_2.0.5     
##  [9] evaluate_1.0.5      grid_4.6.0         
## [11] RColorBrewer_1.1-3  fastmap_1.2.0      
## [13] plyr_1.8.9          rprojroot_2.1.1    
## [15] jsonlite_2.0.0      Matrix_1.7-5       
## [17] gridExtra_2.3       BiocManager_1.30.27
## [19] purrr_1.2.2         viridisLite_0.4.3  
## [21] scales_1.4.0        codetools_0.2-20   
## [23] jquerylib_0.1.4     cli_3.6.6          
## [25] rlang_1.2.0         withr_3.0.2        
## [27] cachem_1.1.0        yaml_2.3.12        
## [29] otel_0.2.0          tools_4.6.0        
## [31] parallel_4.6.0      reshape2_1.4.5     
## [33] BiocParallel_1.47.0 dplyr_1.2.1        
## [35] colorspace_2.1-2    ggplot2_4.0.3      
## [37] here_1.0.2          reticulate_1.46.0  
## [39] buildtools_1.0.0    vctrs_0.7.3        
## [41] R6_2.6.1            png_0.1-9          
## [43] lifecycle_1.0.5     stringr_1.6.0      
## [45] pkgconfig_2.0.3     pillar_1.11.1      
## [47] bslib_0.11.0        gtable_0.3.6       
## [49] glue_1.8.1          Rcpp_1.1.1-1.1     
## [51] xfun_0.58           tibble_3.3.1       
## [53] tidyselect_1.2.1    sys_3.4.3          
## [55] knitr_1.51          farver_2.1.2       
## [57] snow_0.4-4          htmltools_0.5.9    
## [59] labeling_0.4.3      rmarkdown_2.31     
## [61] maketools_1.3.2     compiler_4.6.0     
## [63] S7_0.2.2