The sciNOME package provides an ultra-fast, lightweight,
and robust pipeline for processing Single-Cell RNA-seq data. This
vignette demonstrates the complete standard workflow, from creating the
RNA object to trajectory inference, using a lightweight
simulated dataset.
To ensure this vignette compiles rapidly without requiring external downloads, we simulate a small single-cell expression matrix (50 genes x 50 cells) and matching metadata.
We artificially increase the expression of specific genes in the first 25 cells to simulate distinct cell populations.
set.seed(42)
# Simulate a 50x50 count matrix
mock_counts <- matrix(rpois(2500, lambda = 5), nrow = 50, ncol = 50)
# Inject artificial differential expression for distinct clusters
mock_counts[1:10, 1:25] <- mock_counts[1:10, 1:25] + 15
rownames(mock_counts) <- c("MT-ND1", "MT-ND2", paste0("Gene_", 3:50))
colnames(mock_counts) <- paste0("Cell_", 1:50)
# Simulate basic metadata
mock_meta <- data.frame(
Cell_ID = paste0("Cell_", 1:50),
Batch = rep(c("Batch_1", "Batch_2"), times = 25)
)We build the RNA object and perform quality control (QC)
and normalization in just two steps. We set the filtering thresholds to
0 here to accommodate the tiny mock dataset.
# Build the object
rna_obj <- Build_RNAObject(
expr_mat = mock_counts,
meta_data = mock_meta,
meta_id_col = "Cell_ID",
min_cells = 0,
min_features = 0,
mt_pattern = "^MT-"
)
# Process QC and LogNormalization (Skip scaling for speed in this example)
rna_obj <- ProcessQC_RNA(
obj = rna_obj,
mt_pattern = "^MT-",
min_nCount = 0,
min_nFeature = 0,
max_mt = 100,
norm_method = "LogNormalize",
do_scale = FALSE
)
# Check processed metadata
head(rna_obj$filter_meta.data, 3)
#> nCount_RNA nFeature_RNA percent_mt Batch
#> Cell_1 440 49 10.68182 Batch_1
#> Cell_2 376 48 10.10638 Batch_2
#> Cell_3 415 49 9.39759 Batch_1Next, we identify Highly Variable Genes (HVGs) and perform Principal Component Analysis (PCA).
rna_obj <- RunDimReduction_RNA(
obj = rna_obj,
method = "PCA",
layer_name = "data",
n_hvg = 20, # Use top 20 HVGs for this tiny dataset
pca_rank = 5 # Compute top 5 PCs
)
# Preview PCA coordinates
head(rna_obj$reductions$pca, 3)
#> PC_1 PC_2 PC_3 PC_4 PC_5
#> Cell_1 0.2637178 -1.1234396 3.2673977 1.8825843 0.9826002
#> Cell_2 5.0227921 0.3069669 1.8647245 -4.5375292 0.9639599
#> Cell_3 2.0648461 1.8569399 0.1649815 -0.9681444 -2.4084489We use hierarchical clustering (which is extremely fast and stable for small data) to identify cell clusters based on the PCA space.
We perform an extremely fast Wilcoxon Rank Sum Test to find marker genes between the identified clusters.
dea_res <- RunDEA_RNA(
obj = rna_obj,
layer_name = "data",
group_col = "Auto_Cluster",
ident_1 = "Cluster_1",
ident_2 = "Cluster_2",
min_pct = 0, # Relaxed for mock data
logfc_thresh = 0 # Relaxed for mock data
)
# Display top 5 differentially expressed genes
head(dea_res, 5)
#> gene p_val avg_log2FC pct.1 pct.2 p_val_adj cluster
#> 1 MT-ND1 1.298735e-05 -0.8493024 1 1 0.0003077857 Cluster_1
#> 2 Gene_7 1.418637e-05 -0.9819662 1 1 0.0003077857 Cluster_1
#> 3 Gene_3 1.846714e-05 -0.8751727 1 1 0.0003077857 Cluster_1
#> 4 Gene_8 2.505259e-05 -0.8310393 1 1 0.0003131574 Cluster_1
#> 5 Gene_4 5.144484e-05 -0.7676672 1 1 0.0004287070 Cluster_1
#> comparison
#> 1 Cluster_1 vs Cluster_2
#> 2 Cluster_1 vs Cluster_2
#> 3 Cluster_1 vs Cluster_2
#> 4 Cluster_1 vs Cluster_2
#> 5 Cluster_1 vs Cluster_2Finally, we infer the developmental trajectory (Pseudotime) using the
PCA coordinates, starting from Cluster_1.
# We wrap this in a check to ensure 'princurve' is available
if (requireNamespace("princurve", quietly = TRUE)) {
rna_obj <- RunPseudotime_RNA(
obj = rna_obj,
reduction = "pca",
group_col = "Auto_Cluster",
start_clus = "Cluster_1",
algorithm = "cluster"
)
# Check assigned pseudotime values
head(rna_obj$filter_meta.data$Pseudotime)
}
#> [1] 0.4990516 1.0000000 0.7495275 0.5378360 0.5072763 0.5969383sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.3 dplyr_1.2.1 data.table_1.18.4 sciNOME_0.99.0
#> [5] BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 xfun_0.58 bslib_0.11.0
#> [4] ggrepel_0.9.8 rstatix_0.7.3 lattice_0.22-9
#> [7] vctrs_0.7.3 tools_4.6.0 generics_0.1.4
#> [10] stats4_4.6.0 parallel_4.6.0 tibble_3.3.1
#> [13] pkgconfig_2.0.3 Matrix_1.7-5 RColorBrewer_1.1-3
#> [16] S7_0.2.2 S4Vectors_0.51.3 ggridges_0.5.7
#> [19] lifecycle_1.0.5 compiler_4.6.0 farver_2.1.2
#> [22] Seqinfo_1.3.0 carData_3.0-6 htmltools_0.5.9
#> [25] sys_3.4.3 buildtools_1.0.0 sass_0.4.10
#> [28] yaml_2.3.12 Formula_1.2-5 pillar_1.11.1
#> [31] car_3.1-5 ggpubr_0.6.3 jquerylib_0.1.4
#> [34] tidyr_1.3.2 MASS_7.3-65 cachem_1.1.0
#> [37] abind_1.4-8 princurve_2.1.6 nlme_3.1-169
#> [40] tidyselect_1.2.1 digest_0.6.39 purrr_1.2.2
#> [43] splines_4.6.0 maketools_1.3.2 labeling_0.4.3
#> [46] fastmap_1.2.0 grid_4.6.0 cli_3.6.6
#> [49] magrittr_2.0.5 patchwork_1.3.2 broom_1.0.13
#> [52] withr_3.0.2 scales_1.4.0 backports_1.5.1
#> [55] rmarkdown_2.31 igraph_2.3.2 otel_0.2.0
#> [58] ggsignif_0.6.4 pbapply_1.7-4 evaluate_1.0.5
#> [61] knitr_1.51 GenomicRanges_1.65.0 IRanges_2.47.2
#> [64] irlba_2.3.7 viridisLite_0.4.3 mgcv_1.9-4
#> [67] rlang_1.2.0 Rcpp_1.1.1-1.1 glue_1.8.1
#> [70] BiocManager_1.30.27 BiocGenerics_0.59.7 jsonlite_2.0.0
#> [73] R6_2.6.1