This tutorial provides a comprehensive guide on using DeMixSC to deconvolve a benchmark dataset consisting of paired bulk and sc/snRNAseq data. By following the steps below, users can gain an in-depth understanding of the functionalities of DeMixSC and perform deconvolution adeptly.
To run DeMixSC, please make sure your R version is ≥ 4.2.1 and install the following R packages with the required version:
# Check and install required packages.
# You may receive additional messages if it is the first time for you to install the packages.
# Install BiocManager if necessary.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install required packages.
BiocManager::install("sva")
BiocManager::install("preprocessCore")
install.packages("doParallel")
install.packages("nnls")
install.packages('Seurat')
install.packages('Metrics')
Install the latest version of DeMixSC from our GitHub repository:
# install devtools if necessary.
if (!"devtools" %in% rownames(installed.packages())) {
install.packages('devtools')
}
# Install our DeMixSC package using the following command.
devtools::install_github('wwylab/DeMixSC')
# Load required packages.
library(DeMixSC)
## Loading required package: nnls
## Loading required package: preprocessCore
## Loading required package: doParallel
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Loading required package: sva
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
## Loading required package: genefilter
## Loading required package: BiocParallel
library(Seurat)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
## Attaching SeuratObject
library(nnls)
library(preprocessCore)
library(doParallel)
library(sva)
# Before loading the tutorial data, change your working directory to the folder where you saved the data.
# Note: Currently, our data is only open to reviewers.
# To reviewers: Please click the data link and use the password we shared to access our data. Click the 'Data Download Link' under the 'DeMixSC benchmark data' section. You only need to download 'benchmark_data1.RData' and 'reference_data1.RData' (benchmark data in batch-1), and 'ground_truth.RData' (ground truth of the cell-type proportions) to reproduce the results in this tutorial.
# You can change the names accordingly to run DeMixSC on batch-2.
load("./benchmark_data1.RData")
load("./reference_data1.RData")
reference
: A sc/snRNA-seq Seurat object for creating
the reference matrix. Cell-type information is denoted by the
“Annotation” column within this Seurat object.bulk
: Matrix representing the bulk data within the
benchmark dataset.pseudobulk
: Matrix representing the pseudo-bulk data
within the benchmark dataset.truth.prop
: The true cell-type proportions.Matched Data
: The dataset should contain matched bulk
and sc/snRNA-seq data with similar cell-type proportions.Pseudo-bulk Data
: This is created by summing up the
corresponding sc/snRNA-seq data across all cells. DeMixSC uses matched
bulk and pseudo-bulk data to identify and adjust for technological
discrepancies.Reference Matrix
: A sc/snRNA-seq data is used to
generate cell-type-specific reference matrix. Input = DeMixSC.prep.benchmark(Seurat.obj = reference,
annotation = "Annotation",
mat.a = bulk_data1,
mat.b = pseudobulk_data1,
cutoff = 0.05,
scale.factor = 1e6)
mat.a.adj
: Adjusted relative expression matrix ‘a’,
with genes on the rows and samples on the columns.mat.b
: Unadjusted relative expression matrix ‘b’, with
genes on the rows and samples on the columns.reference.adj
: Adjusted reference matrix used for
deconvolving mat.a.adj.reference
: Unadjusted reference matrix used for
deconvolving mat.b.theta.adj
: Adjusted relative abundance matrix of
reference expression.cell.size
: Represents the mean cell size for each cell
type.discrepancy.genes
: Genes significantly influenced by
technological discrepancies.non.discrepancy.genes
: Genes minimally impacted by
technological discrepancies. res_bulk = DeMixSC.deconvolution(bulk.mat = Input$mat.a.adj,
ref = Input$reference.adj, print.sample = NULL)
## Note: DeMixSC defaults to using (max.cores-1). Be cautious to avoid system overload. Consider adjusting `nthread` if needed.
## Starting parallel computation...
## Parallel computation completed.
## Deconvolution tasks completed!
# DeMixSC uses parallel computing, so you will see the above messages if everything works correctly.
res_pseudobulk = DeMixSC.deconvolution(bulk.mat = Input$mat.b,
ref = Input$reference, print.sample = NULL)
## Note: DeMixSC defaults to using (max.cores-1). Be cautious to avoid system overload. Consider adjusting `nthread` if needed.
## Starting parallel computation...
## Parallel computation completed.
## Deconvolution tasks completed!
cell.type.proportions
: A matrix of cell-type
proportions, with cell types on the rows and samples on the
columns.converge
: A vector indicating the number of iterations
until convergence for each sample.weights
: A matrix of gene weights with genes on the
rows and samples on the columns.# Save the estimated cell-type proportions of the bulk data.
prop.est_bulk = res_bulk$cell.type.proportions
# Save the estimated cell-type proportions of the pseudo-bulk data.
prop.est_pseudobulk = res_pseudobulk$cell.type.proportions
# Here are the estimated cell-type proportions of the bulk data.
round(prop.est_bulk, 2)
## Macular_19_D013 Macular_19_D014 Macular_19_D015 Macular_19_D016
## MG 0.04 0.05 0.07 0.04
## Rod 0.66 0.50 0.38 0.47
## AC 0.06 0.07 0.08 0.09
## BC 0.17 0.27 0.32 0.22
## Cone 0.00 0.01 0.02 0.00
## HC 0.06 0.09 0.12 0.08
## RGC 0.00 0.01 0.02 0.10
# Here are the estimated cell-type proportions of the pseudo-bulk data.
round(prop.est_pseudobulk, 2)
## Macular_19_D013 Macular_19_D014 Macular_19_D015 Macular_19_D016
## MG 0.03 0.07 0.03 0.06
## Rod 0.64 0.54 0.43 0.37
## AC 0.05 0.05 0.08 0.08
## BC 0.27 0.34 0.43 0.41
## Cone 0.00 0.00 0.01 0.01
## HC 0.00 0.00 0.01 0.00
## RGC 0.00 0.00 0.01 0.07
# Here, we compare the estimated proportions with their ground truth.
# We use the root mean squared error (RMSE) to evaluate DeMixSC's performance in deconvolving benchmark data in batch-1.
library(Metrics)
# load the ground truth.
load("./ground_truth.RData")
cell.order = c("AC","BC","Cone","HC","MG","RGC","Rod")
# The RMSE values in the bulk data (one for each sample).
for (i in colnames(prop.est_bulk)) {
print(rmse(Truth_proportion[cell.order,i], prop.est_bulk[cell.order,i]))
}
## [1] 0.0169058
## [1] 0.05131309
## [1] 0.044452
## [1] 0.04472796
# The RMSE values in the pseudo-bulk data (one for each sample).
for (i in colnames(prop.est_pseudobulk)) {
print(rmse(Truth_proportion[cell.order,i], prop.est_pseudobulk[cell.order,i]))
}
## [1] 0.03557157
## [1] 0.05490345
## [1] 0.05613007
## [1] 0.05937674
GNU Affero General Public License v3.0
Please contact Shuai Guo (SGuo3@mdanderson.org), Xiaoqian Liu (XLiu31@mdanderson.org), and Wenyi Wang (wwang7@mdanderson.org) if you encounter any issues when processing this tutorial.
Guo, S., Liu, X., Cheng, X., Jiang, Y., Ji, S., Liang, Q., …
& Wang, W. (2023). DeMixSC: a deconvolution framework that uses
single-cell sequencing plus a small benchmark dataset for improved
analysis of cell-type ratios in complex tissue samples. bioRxiv,
2023-11.
https://doi.org/10.1101/2023.10.10.561733