The CLIC package provides the tool for feature selection in the context of integrating unpaired scRNA-seq and scATAC-seq data. Unlike traditional feature selection methods that measure solely the gene’s expression variability, CLIC prioritizes genes that show high empirical correlation between expression and nearby accessibility as “high confidence links”.
CLIC scores, used as a feature’s confidence score is computed using diverse single-cell multiome data from ENCODE. The script used for computing CLIC scores is available here
For more details, please refer to our paper describing the method. PREPRINT LINK PENDING
To install the most up-to-date version of CLIC, please install the software via GitHub repo:
The main function, FindCLICFeatures() accepts scRNA-seq
data in either:
The user must also provide a species name to the species
parameter, which loads the corresponding CLIC score file (e.g.,
human.csv) from the package’s inst/extdata/
folder.
For convenience, the processed data for human BMMC is available for download here in .Rdata format.
It is worth noting that this data is originally a multiome data, meaning both modalities were simultaneously sampled. This is not the setting in which CLIC would be used in a realistic scenario. Users should apply CLIC to integrate separately sampled scRNA-seq and scATAC-seq data.
Here, we treat the two modalities as originating from different experiments for demonstration and evaluation purposes only. We load the RNA and ATAC data separately, pretending that we don’t know their ground-truth pairing information.
We demonstrate how CLIC can be integrated into the Seurat workflow. However, CLIC is effective for any integration framework that aligns gene and scATAC-seq peaks via the proximity assumption.
First, load the scRNA-seq data and scATAC-seq data. Gene activity scores are pre-computed by summing the number of fragments overlapping the gene and its promoter region.
We can follow the standard Seurat preprocessing pipeline and in the process, find the CLIC features to use for integration.
library(CLIC)
library(Seurat)
library(Signac)
library(ggplot2)
# replace with local path to data
load("data/BMMC-s4d8.Rdata")
# RNA preprocessing
rna <- NormalizeData(rna)
out <- FindCLICFeatures(rna, score_name='human-signac-pearson')
use_features <- out$use_features
head(use_features)
rna <- ScaleData(rna, features=use_features)
# ATAC preprocessing
atac <- RunTFIDF(atac)
atac <- FindTopFeatures(atac, min.cutoff = "q0")
atac <- RunSVD(atac)
DefaultAssay(atac) <- "ACTIVITY"
atac <- NormalizeData(atac)
atac <- ScaleData(atac, features = use_features)Using the features found above, we run standard Seurat workflow for integrating scRNA-seq and scATAC-seq data.
# identify anchors for cca
transfer.anchors <- FindTransferAnchors(reference=rna, query=atac,
features=use_features,
reference.assay = "RNA",
query.assay = "ACTIVITY",
reduction = "cca")
# coembed into shared latent space
refdata <- GetAssayData(rna, assay = "RNA", layer = "data")
imputation <- TransferData(anchorset = transfer.anchors, refdata = refdata, weight.reduction = atac[["lsi"]],
dims = 2:30)
atac[["RNA"]] <- imputation
coembed <- merge(x = rna, y = atac, add.cell.ids = c("RNA", "ATAC"))
coembed <- ScaleData(coembed, features = use_features, do.scale = FALSE)
coembed <- RunPCA(coembed, features = use_features, verbose = FALSE)
coembed <- RunUMAP(coembed, dims = 1:30)
embedding_filtered <- Embeddings(coembed, reduction = "pca")
p <- DimPlot(coembed, group.by = c("orig.ident"))We see that the scRNA-seq and scATAC-seq cells are properly mixed in the latent space.
For a more comprehensive benchmark as well as description of method,
please see our paper:
PREPRINT LINK PENDING