RareQ is a network-propagation-based algorithm designed to identify both major and rare cell populations across diverse modalities, including RNA, ADT, ATAC, and spatial data. It operates on the k-nearest neighbor (kNN) graph, quantifying local topological structure via a neighborhood connectivity metric (Q).

This tutorial demonstrates the use of RareQ on a standard single-cell RNA-seq dataset.

Load the data

library(Seurat)
library(RareQ)    # core method for rare-cell detection
library(ggplot2)

# Read example data
# Please replace the path with your local "Tutorial_example" directory
obj = readRDS('Tutorial_example/data/Jurkat.RDS')
counts = obj@assays$RNA@counts

Preprocess the scRNA-seq data

RareQ operates on the kNN graph, therefore standard preprocessing is required. We use Seurat for quality control (QC), normalization, highly variable gene (HVG) selection, scaling, PCA and kNN graph construction.

# Create Seurat object
sc_object <- CreateSeuratObject(count=counts, project = "sc_object", min.cells = 3)

# Basic QC
sc_object$percent.mt <- PercentageFeatureSet(sc_object, pattern = "^MT-")
sc_object <- subset(sc_object, percent.mt<20)

# Normalize, find HVGs and scale data
sc_object <- NormalizeData(sc_object)
sc_object <- FindVariableFeatures(sc_object, nfeatures=2000)
sc_object <- ScaleData(sc_object)

# Dimensionality reduction (50 PCs recommended to retain rare-cell structure)
sc_object <- RunPCA(sc_object, npcs=50)
sc_object <- RunUMAP(sc_object, dims=1:50)

# Build the kNN graph
sc_object <- FindNeighbors(object = sc_object,
                           k.param = 20,              # k = 20 typical in scRNA workflows
                           compute.SNN = FALSE,       # RareQ relies on directed kNN, not SNN
                           prune.SNN = 0,
                           reduction = "pca",
                           dims = 1:50,
                           force.recalc = FALSE,
                           return.neighbor = TRUE)    # returned neighbor list is essential for Q propagation

Run RareQ

RareQ is executed using the FindRare() function with the processed Seurat object as input. The function returns the cluster label (major + rare clusters) for each cell.

cluster <- FindRare(sc_object)
table(cluster)
## cluster
##   33 1125 1549 
## 1526   14   16

The detailed steps of the RareQ algorithm involve the following:

The function ComputeQ() can be used independently to visualize Q values for each cell. This is useful for understanding the landscape of rare vs major cell states.

ComputeQ(
  sc_object,
  assay='RNA',
  k=6
)

Attach the identified cluster labels to Seurat metadata.

sc_object$cluster = cluster

Visualization

# Label cluster according to cluster size
cluster.cnt <- sort(table(sc_object$cluster))
sc_object$cluster_sort = factor(as.character(sc_object$cluster), levels = names(cluster.cnt), 
                                labels = 1:length(cluster.cnt), ordered = TRUE)

# Define color palette for visualization
cols <- c("#532C8A","#c19f70","#f9decf","#c9a997","#B51D8D","#9e6762","#3F84AA","#F397C0",
          "#C594BF","#DFCDE4","#eda450","#635547","#C72228","#EF4E22","#f77b59","#989898",
          "#7F6874","#8870ad","#65A83E","#EF5A9D","#647a4f","#FBBE92","#354E23","#139992",
          "#C3C388","#8EC792","#0F4A9C","#8DB5CE","#1A1A1A","#FACB12","#C9EBFB","#DABE99",
          "#ed8f84","#005579","#CDE088","#BBDCA8","#F6BFCB"
)

getPalette = colorRampPalette(cols[c(2,3,1,5,6,7,8,9,11,12,13,14,16,18,19,20,22,23,24,27,28,29,30,31,34)])

# UMAP visualization of RareQ-identified clusters
DimPlot(sc_object, group.by='cluster_sort') +
  scale_color_manual(values = getPalette(length(unique(sc_object$cluster_sort))))