uberonpeek – a look at UBERON ontology, etc., with ontoProc2

Introduction

The ontoProc2 package is designed to give convenient access to the ontologies that are transformed to “semantic SQL” in the INCAtools project.

We’ll start by retrieving the current UBERON ontology and examining some tables and “statements”.

library(ontoProc2)
library(DBI)
library(dplyr)
ubss <- semsql_connect(ontology = "uberon")
report(ubss)
## 
## ============================================================ 
## SemsqlConn Object
## ============================================================ 
## 
## Connection Details:
## ---------------------------------------- 
##   Database path:    /github/home/.cache/R/BiocFileCache/11d4cd2265b_uberon.db 
##   Ontology prefix:  UBERON 
##   Status:           ✓   Connected 
## 
## Database Statistics:
## ---------------------------------------- 
##   Labeled terms:    28,764 
##   Direct edges:     80,942 
##   Entailed edges:   6,421,630 
##   Definitions:      23,007 
## 
## Terms by Prefix (top 5):
## ---------------------------------------- 
##   UBERON:          16,067
##   GO:              7,433
##   CL:              1,477
##   _:               1,260
##   CHEBI:           917
## 
## Key Tables Available:
## ---------------------------------------- 
##   ✓  rdfs_label_statement 
##   ✓  has_text_definition_statement 
##   ✓  edge 
##   ✓  entailed_edge 
##   ✓  rdfs_subclass_of_statement 
##   ✓  owl_some_values_from 
##   ✓  has_oio_synonym_statement 
## 
## ============================================================ 
## Use methods like search_labels(), get_ancestors(), etc.
## Run ?SemsqlConn for documentation.
## ============================================================
ubcon <- ubss@con
head(dbListTables(ubcon))
## [1] "all_problems"                    "annotation_property_node"       
## [3] "anonymous_class_expression"      "anonymous_expression"           
## [5] "anonymous_individual_expression" "anonymous_property_expression"
tbl(ubcon, "statements")
## # A query:  ?? x 8
## # Database: sqlite 3.53.2 [/github/home/.cache/R/BiocFileCache/11d4cd2265b_uberon.db]
##    stanza         subject        predicate  object value datatype language graph
##    <chr>          <chr>          <chr>      <chr>  <chr> <chr>    <chr>    <chr>
##  1 obo:uberon.owl obo:uberon.owl foaf:home… <NA>   http… xsd:any… <NA>     <NA> 
##  2 obo:uberon.owl obo:uberon.owl rdfs:comm… <NA>   Aure… <NA>     <NA>     <NA> 
##  3 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   ZFS … <NA>     <NA>     <NA> 
##  4 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   ZFA … <NA>     <NA>     <NA> 
##  5 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   XAO … <NA>     <NA>     <NA> 
##  6 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   WBls… <NA>     <NA>     <NA> 
##  7 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   WBbt… <NA>     <NA>     <NA> 
##  8 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   TGMA… <NA>     <NA>     <NA> 
##  9 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   TAO … <NA>     <NA>     <NA> 
## 10 obo:uberon.owl obo:uberon.owl oio:treat… <NA>   TADS… <NA>     <NA>     <NA> 
## # ℹ more rows

Parent-child relations

CRAN’s ontologyIndex package provides a familiar representation that simplifies visualization.

uboi <- semsql_to_oi(ubcon)
## Warning in ontologyIndex::ontology_index(name = nn, parents = pl): Some parent
## terms not found: BFO:0000001, COB:0000502, CARO:0000000 (4 more)
uboi
## Ontology with 25854 terms
## 
## Properties:
##  id: character
##  name: list
##  parents: list
##  children: list
##  ancestors: list
##  obsolete: logical
## Roots:
##  CHEBI:24432 - biological role
##  CHEBI:51086 - chemical role
##  CHEBI:33232 - application
##  CHEBI:23367 - molecular entity
##  BFO:0000003 - occurrent
##  CHEBI:24433 - group
##  CHEBI:33250 - atom
##  BFO:0000002 - continuant
##  UBERON:0035943 - life cycle temporal boundary
##  CARO:0000007 - immaterial anatomical entity
##  ... 3 more
uboi$name[10364:10370]
## $`NCBITaxon:9935`
## [1] "Ovis"
## 
## $`NCBITaxon:9963`
## [1] "Caprinae"
## 
## $`NCBITaxon:9971`
## [1] "Pholidota"
## 
## $`NCBITaxon:9972`
## [1] "Manidae"
## 
## $`NCBITaxon:9975`
## [1] "Lagomorpha"
## 
## $`NCBITaxon:9989`
## [1] "Rodentia"
## 
## $`PATO:0000001`
## [1] "quality"

A sense of the variety of ontological cross-references present can be given by tabling the tag prefixes.

prefs <- gsub(":.*", "", names(uboi$name))
table(prefs)
## prefs
##       BFO      BSPO      CARO     CHEBI        CL       COB        GO       IAO 
##        14        12         5       915      1474         5      7428         5 
##       NBO NCBITaxon      PATO        PR        RO    UBERON 
##        37       474       159       353         1     14972

By using the ancestors component we can obtain a view of is-a relations (presumably developed from rdfs:subClassOf predicates). We’ve chosen as terminal tags the tags for heart, kidney, and cortex of kidney.

onto_plot2(
  uboi,
  unlist(uboi$ancestors[c(
    "UBERON:0002189",
    "UBERON:0002113", "UBERON:0000948"
  )])
)

EFO and NCI thesaurus

On cursory inspection, the EFO ontology has considerable information about anatomic locations of diseases.

We’ll use the entailed edges table in EFO to find all statements that have ‘heart’ (UBERON:0000948) as object.

eforef <- semsql_connect(ontology = "efo") # 240 MB
## Connected to SemanticSQL database: /github/home/.cache/R/BiocFileCache/11d4355bc84c_efo.db
## Primary ontology prefix: EFO
# nciref = semsql_connect("ncit")  # > 500MB, block
htab <- tbl(eforef@con, "entailed_edge") |>
  filter(object == "UBERON:0000948") |>
  as.data.frame()
head(htab)
##          subject       predicate         object
## 1 UBERON:0000948 rdfs:subClassOf UBERON:0000948
## 2    EFO:0009285     IAO:0000136 UBERON:0000948
## 3    EFO:0600032     IAO:0000136 UBERON:0000948
## 4    EFO:0008398     IAO:0000136 UBERON:0000948
## 5    EFO:0009291     IAO:0000136 UBERON:0000948
## 6    EFO:0009290     IAO:0000136 UBERON:0000948

It is tedious to see these formal tags. We have assembled a simple character vector map that covers many tags.

data(ncit_map)
head(ncit_map)
##           IAO:0000112           IAO:0000114           IAO:0000115 
##    "example of usage" "has curation status"          "definition" 
##           IAO:0000116           IAO:0000117           IAO:0000232 
##         "editor note"         "term editor"        "curator note"

What are the predicates of the heart table above?

ncit_map[unique(htab$predicate)]
##                            <NA>                     IAO:0000136 
##                              NA                      "is_about" 
##                            <NA>                      RO:0002502 
##                              NA                    "depends on" 
##                     BFO:0000066                      RO:0002131 
##                     "occurs in"                      "overlaps" 
##                      RO:0001025                      RO:0002314 
##                    "located_in"            "inheres in part of" 
##                     BFO:0000050                     EFO:0000784 
##                       "part_of"          "has_disease_location" 
##                      RO:0000052                      RO:0004027 
##                    "inheres_in" "disease has inflammation site"

To enumerate and decode the terms with disease location (EFO:0000784) in heart, we have

library(dplyr)
library(DT)
hdis <- ncit_map[unlist(htab |> dplyr::filter(predicate == "EFO:0000784")
  |> dplyr::select(subject))]
datatable(data.frame(tag = names(hdis), value = as.character(hdis)))

Session information

sessionInfo()
## R version 4.6.1 (2026-06-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 26.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] S7_0.2.2          DBI_1.3.0         dplyr_1.2.1       DT_0.34.0        
## [5] ontoProc2_0.99.24 BiocStyle_2.41.0 
## 
## loaded via a namespace (and not attached):
##  [1] utf8_1.2.6          rappdirs_0.3.4      sass_0.4.10        
##  [4] generics_0.1.4      xml2_1.6.0          RSQLite_3.53.2     
##  [7] digest_0.6.39       magrittr_2.0.5      evaluate_1.0.5     
## [10] grid_4.6.1          fastmap_1.2.0       blob_1.3.0         
## [13] R.oo_1.27.1         jsonlite_2.0.0      ontologyIndex_2.12 
## [16] R.utils_2.13.0      ontologyPlot_1.7    graph_1.91.0       
## [19] BiocManager_1.30.27 purrr_1.2.2         crosstalk_1.2.2    
## [22] Rgraphviz_2.57.0    codetools_0.2-20    httr2_1.2.3        
## [25] jquerylib_0.1.4     paintmap_1.0        cli_3.6.6          
## [28] rlang_1.2.0         dbplyr_2.6.0        R.methodsS3_1.8.2  
## [31] bit64_4.8.2         withr_3.0.3         cachem_1.1.0       
## [34] yaml_2.3.12         otel_0.2.0          tools_4.6.1        
## [37] memoise_2.0.1       filelock_1.0.3      BiocGenerics_0.59.7
## [40] curl_7.1.0          buildtools_1.0.0    vctrs_0.7.3        
## [43] R6_2.6.1            stats4_4.6.1        BiocFileCache_3.3.0
## [46] lifecycle_1.0.5     htmlwidgets_1.6.4   bit_4.6.0          
## [49] pkgconfig_2.0.3     pillar_1.11.1       bslib_0.11.0       
## [52] glue_1.8.1          xfun_0.59           tibble_3.3.1       
## [55] tidyselect_1.2.1    sys_3.4.3           knitr_1.51         
## [58] htmltools_0.5.9     rmarkdown_2.31      maketools_1.3.2    
## [61] compiler_4.6.1