--- title: "celltoprotein -- connecting Cell and Protein Ontologies" author: "Vincent J. Carey, stvjc at channing.harvard.edu" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{celltoprotein -- connecting Cell and Protein Ontologies} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: highlight: pygments number_sections: yes theme: united toc: yes --- # Introduction In a pair of papers from the Ventner Institute, [Bakken et al.](https://link.springer.com/article/10.1186/s12859-017-1977-1) and [Aevermann et al.](https://academic.oup.com/hmg/article/27/R1/R40/4953379) discuss ontological implications of single-cell transcriptomics. A process of cell type definition via "necessary and sufficient marker gene" enumeration is introduced. In this vignette we indicate how Cell Ontology, Relational Ontology, and Protein Ontology can be connected to assess formal relationships between declared cell types and plasma membrane features that can play a role in cell type definition. # Given a cell type, what proteins are noted as parts of its plasma membrane? Connect to the relational ontology and search for CURIEs related to "plasma membrane". ```{r lkon1,message=FALSE} library(ontoProc2) library(DT) ro <- semsql_connect(ontology = "ro") search_labels(ro, "plasma membrane") ``` We have a helper resource for finding exact Cell Ontology names of cell types. ```{r lkct1} data("tag2cn", package = "ontoProc2") cd8reg <- grep("CD8-positive.*regulatory", tag2cn, value = TRUE) cd8reg ``` Now with these cell type identifiers, we can search for the proteins identified as "part of plasma membrane". We need to use the CURIEs for precision. THIS IS BLOCKED UNTIL WE HAVE A SUBSET OF PR DATA TO ILLUSTRATE AS THE PR DOWNLOADS ARE TOO SLOW. ```{r lkct2, message=FALSE,eval=FALSE} prtab <- get_present_pmp(names(cd8reg)) datatable(prtab) ``` # Given a protein, what cell types are asserted to possess it as a membrane part? We pick two proteins and look for associated cell types. BLOCKED AS ABOVE. ```{r lkpr1, message=FALSE,eval=FALSE} prs <- c("PR:000001094", "PR:000001380") clk <- try(cells_with_pmp(prs)) if (inherits(clk, "try-error")) message("it is necessary to allow a large download of Protein Ontology for this chunk to run") else datatable(clk) ``` # Some details The "entailed edge" table of the Semantic SQL representation of Cell Ontology includes all assertions that are derivable from base axioms of the ontology. ```{r lkentedg, message=FALSE} cl <- semsql_connect(ontology = "cl") cl library(dplyr) tbl(cl@con, "entailed_edge") tbl(cl@con, "entailed_edge") |> count() ``` We can look for statements that have "RO:0002104" as predicate: ```{r domo} tbl(cl@con, "entailed_edge") |> filter(predicate == "RO:0002104") |> as.data.frame() |> filter(grepl("PR:", object)) |> arrange(subject) |> datatable() ``` Disconnect databases. ```{r dodisc} disconnect(cl) disconnect(ro) ``` # Session information ```{r lksess} sessionInfo() ```