A general and flexible method for signal extraction from single-cell RNA-seq data

Article Source

Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts.

Weill Cornell Medicine researchers present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. They demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.

Schematic view of the ZINB-WaVE model


Given n cells and J genes, let Y ij denote the count of gene j (j = 1,…, J) for cell i (i = 1,…, n) and Z ij an unobserved indicator variable, equal to one if gene j is a dropout in cell i and zero otherwise. Then, μij=E[YijZij=0,X,V,W]μij=EYij∣Zij=0,X,V,W and πij=Pr(Zij=1X,V,W)πij=PrZij=1∣X,V,W. We model ln(μ) and logit(π) with the regression specified in the figure. Note that the model allows for different covariates to be specified in the two regressions

Availability – The approach is implemented in the open-source R package zinbwave, publicly available through the Bioconductor Project (https://bioconductor.org/packages/zinbwave).

Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. (2018) A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 9(1):284. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *