The bHIVE package implements an Artificial Immune Network (AI-Net) algorithm for clustering and classification tasks. Inspired by biological immune systems, bHIVE uses principles like clonal selection, mutation, and suppression to analyze and model data.
This vignette demonstrates how to: 1. Perform clustering and
classification using bHIVE. 2. Tune hyperparameters using
swarmbHIVE. 3. Use the caret wrapper for easy
integration with machine learning workflows. 4. Use multilayered immune
networks with honeycombHIVE 5. Visualize results with
ggplot2.
bHIVE()The behavior of the bHIVE function can be fine-tuned
using a range of hyperparameters. Below is a description of the key
parameters:
| Parameter | Description |
|---|---|
X |
A numeric matrix or data frame of input features (rows are observations, columns are features). |
y |
(Optional) Factor target vector for classification. If
NULL, clustering is performed. |
task |
Specifies the task: "clustering" or
"classification". |
nAntibodies |
Number of initial antibodies in the population. Larger values increase diversity but add computational cost. |
beta |
Clone multiplier. Determines how many clones are generated for top-matching antibodies. |
epsilon |
Suppression threshold. Antibodies closer than epsilon
are considered redundant and removed. |
maxIter |
Maximum number of iterations for the algorithm. |
affinityFunc |
Affinity (similarity) function. Options include
"gaussian", "laplace",
"polynomial", "cosine",
"hamming". |
distFunc |
Distance function for clustering and suppression. Options include
"euclidean", "manhattan",
"cosine", "minkowski",
"hamming". |
affinityParams |
A list of optional parameters for the affinity/distance functions. |
mutationDecay |
Factor controlling how the mutation rate decays over iterations.
Default is 1.0 (no decay). |
mutationMin |
Minimum mutation rate to avoid zero mutation. |
maxClones |
Maximum number of clones per antibody. Default is unlimited
(Inf). |
stopTolerance |
Tolerance for stopping the algorithm if the antibody population size stabilizes. |
noImprovementLimit |
Number of iterations without improvement before early stopping. |
initMethod |
Method for initializing antibodies. Options: "sample"
(randomly selects rows from X), "random"
(Gaussian noise), "random_uniform" (samples uniformly in
[min, max] of each column), or "kmeans++" (kmeans++-like
initialization for coverage). |
k |
Number of top-matching antibodies to consider during cloning. |
seed |
Random seed for reproducibility. |
verbose |
Logical. If TRUE, prints progress messages for each
iteration. |
bHIVE WorksaffinityFunc.The following example demonstrates how to configure the
bHIVE function for clustering, emphasizing the impact of
key parameters:
# Load the Iris dataset
data(iris)
X <- as.matrix(iris[, 1:4])
# Configure bHIVE parameters for clustering
set.seed(42)
res <- bHIVE(
X = X, # Input data
task = "clustering", # Task type
nAntibodies = 20, # Number of antibodies
beta = 5, # Clone multiplier
epsilon = 0.01, # Suppression threshold
maxIter = 20, # Maximum iterations
affinityFunc = "gaussian", # Affinity function
distFunc = "euclidean", # Distance function
verbose = TRUE # Print progress
)Clustering is an unsupervised learning task where we
group similar data points based on their features. In this example, we
cluster the numeric features of the Iris dataset using
bHIVE.
# Load Iris dataset
data(iris)
X <- as.matrix(iris[, 1:4])
# Perform clustering
set.seed(42)
res <- bHIVE(X = X,
task = "clustering",
nAntibodies = 10,
beta = 5,
epsilon = 0.05,
maxIter = 20,
k = 3,
verbose = FALSE)
# Add cluster assignments to the data
iris$Cluster <- as.factor(res$assignments)
# Visualize clusters
ggplot(iris, aes(x = Sepal.Length,
y = Sepal.Width,
color = Cluster)) +
geom_point(size = 3) +
labs(title = "Clustering Results with bHIVE",
x = "Sepal Length",
y = "Sepal Width") +
scale_color_viridis(discrete = TRUE) +
theme_minimal()Classification is a supervised learning task where
data points are assigned to predefined categories based on their
features. Here, we classify the species of Iris flowers using
bHIVE.
# Classification setup
y <- iris$Species
# Perform classification
set.seed(42)
res <- bHIVE(X = X,
y = y,
task = "classification",
nAntibodies = 100,
beta = 5,
epsilon = 0.05,
initMethod = "random",
k = 4,
verbose = FALSE)
# Visualize classification results
iris$Predicted <- res$assignments
ggplot(iris, aes(x = Sepal.Length,
y = Sepal.Width,
color = Predicted,
shape = Species)) +
geom_point(size = 3) +
labs(title = "Classification Results with bHIVE",
x = "Sepal Length",
y = "Sepal Width") +
scale_color_viridis(discrete = TRUE) +
theme_minimal()Comparing predicted vs actual
## Actual
## Predicted setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 37 1
## virginica 0 13 49
Tuning hyperparameters is crucial for optimizing the performance of
machine learning algorithms. In this example, we perform hyperparameter
tuning for clustering using swarmbHIVE.
grid <- expand.grid(
nAntibodies = c(10, 20),
beta = c(3, 5),
epsilon = c(0.01, 0.05)
)
# Perform tuning
set.seed(42)
tuning_results <- swarmbHIVE(X = X,
task = "clustering",
grid = grid,
metric = "silhouette",
verbose = FALSE)
# Visualize tuning results
ggplot(tuning_results$results, aes(x = nAntibodies,
y = metric_value,
color = factor(beta))) +
geom_line() +
geom_point(aes(shape = as.factor(epsilon)),
size = 3) +
labs(title = "Hyperparameter Tuning Results",
x = "Number of Antibodies",
y = "Silhouette Score",
color = "Beta",
shape = "Epsilon") +
scale_color_viridis(discrete = TRUE) +
theme_minimal()Best parameters
## nAntibodies beta epsilon metric_value
## 5 10 3 0.05 0.2404069
caret wrapperThe bHIVE package provides a caret wrapper
for seamless integration with the caret framework, allowing
for easy cross-validation and hyperparameter tuning. Here, we
demonstrate classification on the iris dataset.
data(iris)
X <- as.matrix(iris[, 1:4])
y <- iris$Species
# Splitting Training and Validation Data Sets
set.seed(42)
sample.idx <- sample(nrow(X), nrow(X)*0.7)
x_test <- X[sample.idx,]
x_val <- X[-sample.idx,]
y_test <- y[sample.idx]
y_val <- y[-sample.idx]
train_control <- trainControl(method = "cv", number = 2)
set.seed(42)
model <- train(x = x_test,
y = y_test,
method = bHIVEmodel,
trControl = train_control,
tuneGrid = expand.grid(
nAntibodies = c(10, 20, 30),
beta = c(3, 5, 10),
epsilon = c(0.01, 0.05, 0.1)),
verbose = FALSE)
# Visualize caret results
ggplot(model) +
labs(title = "Caret Tuning Results for bHIVE",
x = "Hyperparameter Combination",
y = "Performance Metric") +
scale_color_viridis(discrete = TRUE) +
theme_minimal()To use the best performing model from the above, we just using the
predict() function with the separated validation data set
(X_val).
## Actual
## Predicted setosa versicolor virginica
## setosa 11 0 13
## versicolor 0 14 5
## virginica 1 1 0
In honeycombHIVE, clustering proceeds hierarchically
across multiple layers:
honeycombHIVE iteratively builds and refines a network
of “antibodies” (prototypes) to capture complex patterns in data.
Initially, the bHIVE algorithm creates a set of prototypes,
and then each layer can be fine-tuned using gradient-based updates (via
the refineB() function) with flexible optimizers like SGD,
Adam, and RMSProp.
Fo each layer, a collapse step aggregates the refined prototypes - by computing a centroid, medoid, or another statistic - compressing the data into a new, lower-dimensional representation where each prototype becomes a new observation. This new representation serves as the input for the next layer, similar to neural network architecture.
# Load the Iris dataset
data(iris)
X <- as.matrix(iris[, 1:4])
# Run honeycombHIVE for clustering
res <- honeycombHIVE(X = X,
task = "clustering",
epsilon = 0.05,
layers = 3,
nAntibodies = 30,
beta = 5,
maxIter = 10,
verbose = FALSE)
# Visualize results from each layer
for (i in seq_along(res)) {
# Create a data frame for plotting; add original Sepal.Length and Sepal.Width
plot_df <- data.frame(
Sepal.Length = iris$Sepal.Length,
Sepal.Width = iris$Sepal.Width,
Cluster = factor(res[[i]]$membership) # cluster labels from layer i
)
# Basic ggplot scatter plot
plot <- ggplot(plot_df, aes(x = Sepal.Length,
y = Sepal.Width,
color = Cluster)) +
geom_point(size = 3) +
labs(
title = paste("honeycombHIVE Clustering - Layer", i),
x = "Sepal Length",
y = "Sepal Width"
) +
theme_minimal() +
scale_color_viridis(discrete = TRUE)
print(plot)
}Note: If you use task = "classification", honeycombHIVE
will generate multi-layer predictions. You can compare each layer’s
predictions against the true labels to see if performance improves or if
the data become too collapsed.
The refineB() function takes the prototypes produced by
the bHIVE() algorithm and fine-tunes them using
gradient‐based updates. In addition to the basic parameters:
# Prepare the Iris dataset
X <- as.matrix(iris[, 1:4])
y <- iris$Species
# Run bHIVE to obtain initial antibody prototypes.
set.seed(42)
res <- bHIVE(X = X,
y = y,
task = "classification",
nAntibodies = 10,
beta = 5,
epsilon = 0.05,
initMethod = "random",
k = 4,
verbose = FALSE)
Ab <- res$antibodies
colnames(Ab) <- colnames(X)
assignments <- res$assignments
# Ensure assignments are numeric indices.
assignments <- as.integer(factor(assignments, levels = unique(assignments)))
# PCA of the Iris data for visualization in 2D
pca <- prcomp(X, scale. = TRUE)
X_pca <- pca$x[, 1:2]
colnames(X_pca) <- c("PC1", "PC2")
# Transform initial prototypes into PCA space.
A_bhive_pca <- predict(pca, Ab)
# Run refinement using several optimizers.
optimizers <- c("sgd", "adam", "rmsprop")
refined_list <- lapply(optimizers, function(opt) {
Ab_refined <- refineB(A = Ab,
X = X,
y = y,
assignments = assignments,
loss = "categorical_crossentropy",
task = "classification",
steps = 5,
lr = 0.01,
verbose = FALSE,
optimizer = opt,
beta1 = 0.9,
beta2 = 0.999,
rmsprop_decay = 0.9)
colnames(Ab_refined) <- colnames(X)
A_refined_pca <- predict(pca, Ab_refined)
data.frame(optimizer = opt,
PC1_after = A_refined_pca[,1],
PC2_after = A_refined_pca[,2])
})
refined_df <- do.call(rbind, refined_list)
refined_df$optimizer <- factor(refined_df$optimizer, levels = optimizers)Interpretation * Data points are shown in a light
transparency, colored by their cluster assignments (as determined by the
original bHIVE algorithm). * Initial prototypes are shown
in dark purple. These are the prototypes (antibodies) before refinement.
* Refined prototypes are shown in bright yellow. The arrows indicate the
direction and magnitude of the update from the original positions to the
refined positions.
data(iris)
X <- as.matrix(iris[, 1:4])
y <- iris$Species
res_class <- honeycombHIVE(X = X,
y = y,
layers = 3,
task = "classification",
nAntibodies = 30,
beta = 5,
epsilon = 0.01,
verbose = FALSE)
res_class_refine <- honeycombHIVE(X = X,
y = y,
task = "classification",
layers = 3,
nAntibodies = 30,
beta = 5,
epsilon = 0.01,
refine = TRUE,
refineOptimizer = "adam",
refineLoss = "categorical_crossentropy",
refineSteps = 3,
refineLR = 0.01,
verbose = FALSE)
table(Refined = res_class_refine[[3]]$predictions,
Actual = y)## Actual
## Refined setosa versicolor virginica
## setosa 50 50 50
Below are several examples that demonstrate how to use the visualizeHIVE() function for different tasks and plot types.
In this example we run honeycombHIVE() on the Iris
dataset for a clustering task. The scatterplot is generated using a PCA
transformation to reduce the feature space to two dimensions. Data
points are colored by cluster membership (treated as discrete) and
prototypes are overlaid as black points.
data(iris)
X <- as.matrix(iris[, 1:4])
set.seed(42)
res <- honeycombHIVE(X = X,
task = "clustering",
epsilon = 0.05,
layers = 3,
nAntibodies = 30,
beta = 5,
maxIter = 10,
verbose = FALSE)
visualizeHIVE(result = res,
X = iris[, 1:4],
plot_type = "scatter",
title = "Layer 2: Scatterplot (Clustering)",
layer = 2,
task = "clustering",
transform = TRUE,
transformation_method = "PCA")This scatterplot shows the data projected onto the first two principal components. Each data point is colored according to its cluster (discrete grouping), and the prototypes computed by honeycombHIVE are displayed as large black points. Faceting by layer is applied if multiple layers are selected.
For a classification task, assume that the algorithm stores class predictions. The following example generates a violin plot of the “Sepal.Width” feature from layer 1. The discrete grouping (class labels) is used to color the violins, and prototype values are overlaid as black points.
set.seed(42)
res_class <- honeycombHIVE(X = X,
y = iris$Species,
task = "classification",
layers = 2,
nAntibodies = 15,
beta = 5,
maxIter = 10,
verbose = FALSE)
visualizeHIVE(result = res_class,
X = iris[, 1:4],
plot_type = "violin",
feature = "Sepal.Width",
title = "Violin Plot: Sepal.Width by Group",
layer = 1,
task = "classification")The violin plot shows the distribution of Sepal.Width for each predicted class. Discrete color scales ensure that class labels are clearly distinguished, and the black markers indicate the prototype (group summary) for each group.
The bHIVE package is a versatile tool for clustering and
classification tasks. Its integration with caret simplifies
hyperparameter tuning and cross-validation, making it suitable for a
variety of datasets and use cases. If you have any questions, comments,
or suggestions, please visit the GitHub repository.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] caret_7.0-1 lattice_0.22-9 viridis_0.6.5 viridisLite_0.4.3
## [5] ggplot2_4.0.3 bHIVE_0.99.4 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 timeDate_4052.112 dplyr_1.2.1
## [4] farver_2.1.2 S7_0.2.2 fastmap_1.2.0
## [7] pROC_1.19.0.1 digest_0.6.39 rpart_4.1.27
## [10] timechange_0.4.0 lifecycle_1.0.5 cluster_2.1.8.2
## [13] survival_3.8-6 magrittr_2.0.5 compiler_4.6.0
## [16] rlang_1.2.0 sass_0.4.10 tools_4.6.0
## [19] yaml_2.3.12 data.table_1.18.4 knitr_1.51
## [22] labeling_0.4.3 askpass_1.2.1 reticulate_1.46.0
## [25] plyr_1.8.9 RColorBrewer_1.1-3 BiocParallel_1.47.0
## [28] Rtsne_0.17 purrr_1.2.2 withr_3.0.2
## [31] sys_3.4.3 stats4_4.6.0 nnet_7.3-20
## [34] grid_4.6.0 e1071_1.7-17 future_1.70.0
## [37] globals_0.19.1 scales_1.4.0 iterators_1.0.14
## [40] MASS_7.3-65 cli_3.6.6 rmarkdown_2.31
## [43] generics_0.1.4 umap_0.2.10.0 otel_0.2.0
## [46] future.apply_1.20.2 RSpectra_0.16-2 reshape2_1.4.5
## [49] proxy_0.4-29 cachem_1.1.0 stringr_1.6.0
## [52] splines_4.6.0 parallel_4.6.0 BiocManager_1.30.27
## [55] vctrs_0.7.3 hardhat_1.4.3 Matrix_1.7-5
## [58] jsonlite_2.0.0 listenv_0.10.1 maketools_1.3.2
## [61] clusterCrit_1.3.0 foreach_1.5.2 gower_1.0.2
## [64] jquerylib_0.1.4 recipes_1.3.3 glue_1.8.1
## [67] parallelly_1.47.0 codetools_0.2-20 stringi_1.8.7
## [70] lubridate_1.9.5 gtable_0.3.6 tibble_3.3.1
## [73] pillar_1.11.1 htmltools_0.5.9 ipred_0.9-15
## [76] openssl_2.4.1 lava_1.9.1 R6_2.6.1
## [79] evaluate_1.0.5 png_0.1-9 bslib_0.11.0
## [82] class_7.3-23 Rcpp_1.1.1-1.1 gridExtra_2.3
## [85] nlme_3.1-169 prodlim_2026.03.11 xfun_0.58
## [88] ModelMetrics_1.2.2.2 buildtools_1.0.0 pkgconfig_2.0.3