Scaling reglScatterplot to millions of points

What works at what scale

reglScatterplot() was designed for the size of typical single-cell and spatial datasets, but it can push well past that.

Point count Status Notes
1 - 500 000 Flawless Below the auto performance-mode threshold; full interactivity
500 k - 5 M Smooth performanceMode kicks in automatically
5 M - 20 M Usable Use pointSize = 1, opacity = 1, drop pointLabels
20 M - 100 M Standalone HTML reaches RAM ceiling Tile-based architectures (e.g. deepscatter) start to win
> 100 M Out of reach in-browser Server-side rendering / WebGPU territory

How the wire format works

To keep large datasets shippable inside a standalone htmlwidget, every numeric channel is binary-encoded and base64-wrapped before transit:

Channel Encoder Precision Bytes / point
X / Y (normalised) .toBase64U16() 1 / 32 767 2
Continuous color z .toBase64U16Unit() 1 / 65 535 2
Categorical color z .toBase64U16Int() exact (< 65 536) 2
Filter ranges toBase64() (Float32) full f32 4

At 10 M points the resulting HTML file is around 80 - 90 MB - large but finite. The same data with Float32 everywhere would be ~150 MB.

A benchmark you can run yourself

library(reglScatterplotR)

bench_sizes <- c(1e4, 1e5, 1e6, 5e6)
for (n in bench_sizes) {
    df <- data.frame(x = rnorm(n), y = rnorm(n), v = runif(n))
    t0 <- Sys.time()
    w <- reglScatterplot(df,
        x = "x", y = "y", colorBy = "v",
        height = 600
    )
    payload <- htmlwidgets:::toJSON(w$x)
    cat(sprintf(
        "n = %s : build = %.2fs, payload = %.1f MB\n",
        format(n, big.mark = ","),
        as.numeric(Sys.time() - t0, units = "secs"),
        nchar(payload) / 1024 / 1024
    ))
    rm(df, w, payload)
    gc(verbose = FALSE)
}
#> n = 10,000 : build = 0.02s, payload = 0.1 MB
#> n = 1e+05 : build = 0.04s, payload = 0.8 MB
#> n = 1e+06 : build = 0.43s, payload = 7.8 MB
#> n = 5e+06 : build = 1.34s, payload = 39.2 MB

On a 2020-era laptop with an RTX 2060, 5 M points takes ~1.5s on the R side and another ~2s for the browser to parse and upload to the GPU; pan/zoom then runs at 60 fps.

Sizing inside the host viewport

The widget honours an explicit pixel height verbatim. If the value exceeds the height of the host window (small browser tab, RStudio Viewer in a tiling WM, narrow Jupyter notebook column, etc.), the bottom of the canvas is clipped by the host - not by reglScatterplot.

# Bad in small viewports: a 500 px tall widget overflows a 450 px window.
reglScatterplot(df, x = "x", y = "y", height = 500)

# Good: fill whatever vertical space is available.
reglScatterplot(df, x = "x", y = "y", height = "100%")

# Also good: omit `height` entirely - the sizingPolicy fills the viewer pane.
reglScatterplot(df, x = "x", y = "y")

Knitting to HTML produces a full-page document where the widget can take as much height as you give it, so the same code that clips in the Viewer pane prints cleanly in a knit report. This is purely a viewport effect.

Memory levers for very large data

When you really want to push past 5 M, every per-point byte counts. Suggested defaults for huge inputs:

reglScatterplot(huge_df,
    x = "x", y = "y",
    pointSize = 1, # one pixel per point
    opacity = 1, # no blending math
    showAxes = FALSE, # drops the D3 axis layer
    showTooltip = FALSE, # frees per-point hit-test work
    enableDownload = FALSE, # no html2canvas / jsPDF download
    pointLabels = NULL # don't ship gene names
)

Things you might think help but don’t: * Reducing vmin / vmax clip range - colour scale only, not memory. * Setting legendPosition = "bottom-left" - cosmetic, no perf impact.

Comparison with other R packages

reglScatterplot is one of three credible options for high-volume scatter in R. They aren’t doing the same thing:

Package Interactive? Best at Limit
reglScatterplot Yes 1 - 20 M points in HTML / Shiny Browser RAM / VRAM
plotly (+ toWebGL()) Yes < 500 k points, broad feature set JSON payload bloats past 1 M
scattermore No (static) Quickly rasterising 10 M+ to a PNG No pan / zoom interactivity
ggplot2 No (static) Publication graphics, small data Practical ceiling ~50 k pts

The right choice depends on what you need:

  • Want a printable figure? ggplot2 or scattermore.
  • Want to embed an interactive plot in an HTML report? reglScatterplot.
  • Need brush, click, and faceted layouts more than scale? plotly.

Where the next jump comes from

For genuinely huge data (multi-modal CosMx slides, whole-atlas integrations beyond ~50 M cells), no in-browser library is the right answer today. The viable paths are:

  1. Tile-based architectures - precompute spatial tiles on disk, only load what the viewport needs. See deepscatter (Apache Arrow + Parquet tiles). Requires a server or a static tile directory.
  2. Server-side rendering - send camera state to a Python / Julia backend that renders frames; stream them as images. Lower fidelity but independent of the client.
  3. WebGPU - browser support is maturing; offers compute shaders that would let us do GPU-side filtering and density binning. Currently a two-year horizon.

For now, reglScatterplot covers the typical single-cell, spatial and fold-change use cases comfortably. If you find yourself loading the same 10 M+ dataset repeatedly, the right next step is to switch to a tile server, not a faster scatterplot.

Session info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] reglScatterplotR_0.99.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.58        
#>  [5] maketools_1.3.2   cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
#>  [9] rmarkdown_2.31    buildtools_1.0.0  lifecycle_1.0.5   cli_3.6.6        
#> [13] viridisLite_0.4.3 sass_0.4.10       jquerylib_0.1.4   compiler_4.6.0   
#> [17] sys_3.4.3         tools_4.6.0       evaluate_1.0.5    bslib_0.11.0     
#> [21] yaml_2.3.12       otel_0.2.0        htmlwidgets_1.6.4 jsonlite_2.0.0   
#> [25] rlang_1.2.0       crosstalk_1.2.2