RareQ operates on any Seurat object with a neighborhood graph and can integrate multimodal data. This tutorial demonstrates how to identify rare cell populations in CITE-seq data using RareQ, within Seurat’s multimodal analysis framework.
The example dataset (bmcite) contains paired RNA and ADT
profiles from human bone marrow mononuclear cells. The data and
preprocessing steps can be found at https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.
library(Seurat)
library(RareQ) # core method for rare-cell detection
library(dplyr)
library(ggplot2)
# Read example data
# Please replace the path with your local "Tutorial_example" directory
bm = readRDS('Tutorial_example/data/bmcite.RDS')
Each modality is normalized and dimensionally reduced independently. RNA features capture transcriptional programs, while ADT features reflect surface protein abundance.
Assays(bm)
## [1] "RNA" "ADT"
# RNA processing
DefaultAssay(bm) <- 'RNA'
bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA()
# ADT processing
DefaultAssay(bm) <- 'ADT'
VariableFeatures(bm) <- rownames(bm[["ADT"]]) # use all ADT features for dimensional reduction
bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>%
ScaleData() %>%
RunPCA(reduction.name = 'apca') # run PCA for ADT modality
The Weighted Nearest Neighbor (WNN) is an unsupervised framework for combining multiple modalities measured within the same cell. It assigns cell-specific weights to each modality according to its relative information content and generated a unified low-dimensional embedding that retains modality-specific signals. Here, we use WNN to integrate RNA and protein modalities in CITE-seq data. Note that it can also be applied to joint analysis of scRNA-seq and scATAC-seq data.
bm <- FindMultiModalNeighbors(
bm,
reduction.list = list("pca", "apca"), # use PCA results from RNA and ADT
dims.list = list(1:30, 1:18), # specify number of PCs per modality
modality.weight.name = "RNA.weight"
)
# Run UMAP based on the integrated WNN graph
bm <- RunUMAP(bm, nn.name = "weighted.nn",
reduction.name = "wnn.umap", reduction.key ="wnnUMAP_")
RareQ requires a structure-preserving embedding in which local distances reflect multimodal cell relationships. We obtain this by applying supervised PCA (sPCA) based on the WNN graph: sPCA learns a linear projection of a single modality (here RNA) whose principal components reproduce the neighbor topology defined by the multimodal WNN graph.
DefaultAssay(bm) = 'RNA'
# Compute sPCA embedding guided by the WNN graph
bm <- RunSPCA(bm, assay = 'RNA', graph = 'wsnn')
Then, compute cell neighborhoods. The k.param (default
20) nearest neighbors for each cell were determined based on Euclidean
distances in the low-dimensional sPCA space.
# These will serve as the input graph for RareQ
bm <- FindNeighbors(object = bm,
k.param = 20,
compute.SNN = FALSE, # RareQ relies on directed kNN, not SNN
prune.SNN = 0,
reduction = "spca",
dims = 1:50,
force.recalc = FALSE,
return.neighbor = TRUE) # returned neighbor list is essential for Q propagation
Now we use RareQ::FindRare() to derive both major and
rare cell clusters, and attach the identified cluster labels to Seurat
metadata.
cluster = FindRare(sc_object = bm)
table(cluster)
## cluster
## 12 267 684 885 1035 1094 1620 2207 2468 3361 3991 5432 5800
## 431 328 8 36 11 106 59 257 32 44 9776 218 20
## 5962 7588 7767 8267 8315 8648 9694 9903 10753 11227 12456 12951 14331
## 122 411 258 93 63 63 1411 133 27 507 2105 1323 97
## 14399 15196 15331 15815 16454 16991 17242 17251 19921 21930 22152 24065 24218
## 129 46 315 120 25 515 6958 528 51 29 45 519 57
## 25007 29474
## 205 3191
bm$cluster = cluster
# Label cluster according to cluster size
cluster.cnt <- sort(table(bm$cluster))
bm$cluster_sort = factor(as.character(bm$cluster), levels=names(cluster.cnt),
labels = 1:length(cluster.cnt), ordered = T)
# Define color palette for visualization
cols <- c("#532C8A","#c19f70","#f9decf","#c9a997","#B51D8D","#9e6762","#3F84AA","#F397C0",
"#C594BF","#DFCDE4","#eda450","#635547","#C72228","#EF4E22","#f77b59","#989898",
"#7F6874","#8870ad","#65A83E","#EF5A9D","#647a4f","#FBBE92","#354E23","#139992",
"#C3C388","#8EC792","#0F4A9C","#8DB5CE","#1A1A1A","#FACB12","#C9EBFB","#DABE99",
"#ed8f84","#005579","#CDE088","#BBDCA8","#F6BFCB"
)
getPalette = colorRampPalette(cols[c(2,3,1,5,6,7,8,9,11,12,13,14,16,18,19,20,22,23,24,27,28,29,30,31,34)])
# UMAP visualization of RareQ-identified clusters based on WNN embedding
DimPlot(bm, group.by='cluster_sort', reduction = 'wnn.umap') +
scale_color_manual(values = getPalette(length(unique(bm$cluster_sort))))