rvarsim simulates all possible single nucleotide
variants (SNVs) across MANE Select transcripts and outputs them in HGVS
notation. It also provides a comprehensive toolkit for parsing,
validating, normalizing, converting, transcribing, translating, and
lifting over HGVS variant descriptions.
The four-step pipeline generates all possible SNVs from a reference transcript:
library(rvarsim)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
# Fetch MANE Select transcripts
mane <- fetch_mane_txdb(EnsDb.Hsapiens.v86)
# Get transcript structure
struct <- get_transcript_structure(mane, "ENST00000357654")
# Generate variants
vars <- generate_variants(struct, BSgenome.Hsapiens.UCSC.hg38)
# Add HGVS notation
hgvs <- format_hgvs(vars)
head(hgvs[, c("region", "genomic_ref", "genomic_alt", "hgvs_c")])Or use the all-in-one wrapper:
library(rvarsim)
# Parse HGVS strings into structured objects
variant <- parse_hgvs("NM_000546.6:c.215C>G")[[1]]
variant$type # "substitution"## [1] "substitution"
## [1] "C"
## [1] "G"
## [1] 215
## [1] TRUE
## [1] FALSE
## CHROM POS ID REF ALT QUAL FILTER INFO
## 1 NC_000001 123456 . A G . . HGVS=NC_000001.11:g.123456A>G
## NC_000001.11:123455:A:G
## [1] "NM_000546.6:c.5G>A" "NM_000546.6:c.9G>C"
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rvarsim_0.99.1 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.53.0 SummarizedExperiment_1.43.0
## [3] rjson_0.2.23 xfun_0.58
## [5] bslib_0.11.0 Biobase_2.73.1
## [7] lattice_0.22-9 vctrs_0.7.3
## [9] tools_4.6.0 bitops_1.0-9
## [11] generics_0.1.4 stats4_4.6.0
## [13] curl_7.1.0 parallel_4.6.0
## [15] AnnotationDbi_1.75.0 RSQLite_3.53.1
## [17] blob_1.3.0 BiocBaseUtils_1.15.1
## [19] Matrix_1.7-5 BSgenome_1.81.0
## [21] S4Vectors_0.51.3 cigarillo_1.3.0
## [23] lifecycle_1.0.5 compiler_4.6.0
## [25] Rsamtools_2.29.0 Biostrings_2.81.3
## [27] Seqinfo_1.3.0 codetools_0.2-20
## [29] GenomeInfoDb_1.49.1 htmltools_0.5.9
## [31] sys_3.4.3 buildtools_1.0.0
## [33] sass_0.4.10 lazyeval_0.2.3
## [35] RCurl_1.98-1.19 yaml_2.3.12
## [37] crayon_1.5.3 jquerylib_0.1.4
## [39] BiocParallel_1.47.0 cachem_1.1.0
## [41] DelayedArray_0.39.3 abind_1.4-8
## [43] digest_0.6.39 restfulr_0.0.16
## [45] maketools_1.3.2 fastmap_1.2.0
## [47] grid_4.6.0 cli_3.6.6
## [49] SparseArray_1.13.2 S4Arrays_1.13.0
## [51] GenomicFeatures_1.65.0 XML_3.99-0.23
## [53] UCSC.utils_1.9.0 bit64_4.8.2
## [55] rmarkdown_2.31 XVector_0.53.0
## [57] httr_1.4.8 matrixStats_1.5.0
## [59] bit_4.6.0 otel_0.2.0
## [61] png_0.1-9 memoise_2.0.1
## [63] evaluate_1.0.5 knitr_1.51
## [65] GenomicRanges_1.65.0 IRanges_2.47.2
## [67] BiocIO_1.23.3 rtracklayer_1.73.0
## [69] rlang_1.2.0 DBI_1.3.0
## [71] ensembldb_2.37.3 BiocManager_1.30.27
## [73] BiocGenerics_0.59.7 jsonlite_2.0.0
## [75] AnnotationFilter_1.37.0 R6_2.6.1
## [77] ProtGenerics_1.45.0 MatrixGenerics_1.25.0
## [79] GenomicAlignments_1.49.0