ANS: Adjusted Neighborhood Scoring#

In the field of single-cell RNA sequencing (scRNA-seq), gene signature scoring is integral for pinpointing and characterizing distinct cell populations. However, challenges arise in ensuring the robustness and comparability of scores across various gene signatures and across different batches and conditions. Addressing these challenges, we evaluated the stability of established methods such as Scanpy, UCell, and JASMINE in the context of scoring cells of different types and states. Additionally, we introduced a new scoring method, the Adjusted Neighbourhood Scoring (ANS), that builds on the traditional Scanpy method and improves the handling of the control gene sets. We further exemplified the usability of ANS scoring in differentiating between cancer-associated fibroblasts and malignant cells undergoing epithelial-mesenchymal transition (EMT) in four cancer types and evidenced excellent classification performance (AUCPR train: 0.95-0.99, AUCPR test: 0.91-0.99). In summary, our research introduces the ANS as a robust and deterministic scoring approach that enables the comparison of diverse gene signatures. The results of our study contribute to the development of more accurate and reliable methods for analyzing scRNA-seq data.

Performance comparison ANS and other scRNA-seq scoring methods on celltype annotation.

Note

A preprint describing ANS and showing the results of signature scoring methods benchmark is now available.

Getting started#

Installation#

We aim for Python versions 3.8+. Run:

pip install git+https://github.com/lciernik/ANS_signature_scoring.git

Disclaimer: The implementations of all Tirosh et al. 2016 based scoring methods are largely based on the implementation of the score_genes() method in Scanpy.

Basic usage in Python#

The package allows full compatibility with the Python scRNA-seq analysis toolbox Scanpy. The scoring methods are applied on preprocessed (log-normalized) scRNA-seq.

import signaturescoring as ssc

ssc.score_signature(
    adata=adata,                            # preprocessed (log-normalized) gene expression data in an AnnData object
    gene_list=gene_signature,               # gene expression signature, type list
    method='adjusted_neighborhood_scoring',
    ctrl_size=100,
    score_name='scores',                    # scores stored in adata.obs column defined by score_name
)

print(adata.obs['scores'].describe())

Other method values:

Basic usage in R#

The repository contains an R implementation of the novel scoring method in the folder src_R/adjusted_neighborhood_scoring.R. The file can be downloaded, and the method can be loaded for usage.

Disclaimer: The code is largely based on the implementation of the AddModuleScore method of the Seurat package.

Note: ANS for R should be used on Seurat objects. Source the file in your script and use it identically to AddModuleScore.

Example:

source('MT/ANS_signature_scoring/src_R/adjusted_neighborhood_scoring.R')

# Initialize the Seurat object with the log-normalized data.
# e.g. Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics
pbmc <- "..."

# List of signatures
markers <- list(markers = gene_list)

# score data
pbmc <- AdjustedNeighborhoodScoring(pbmc, features = markers)