Overview

This tutorial provides a comprehensive guide on using DeMixSC to deconvolve a benchmark dataset consisting of paired bulk and sc/snRNAseq data. By following the steps below, users can gain an in-depth understanding of the functionalities of DeMixSC and perform deconvolution adeptly.

Preparation

To run DeMixSC, please make sure your R version is ≥ 4.2.1 and install the following R packages with the required version:

# Check and install required packages.
# You may receive additional messages if it is the first time for you to install the packages.

# Install BiocManager if necessary.
  if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")    
      
# Install required packages.
  BiocManager::install("sva")
  BiocManager::install("preprocessCore")
  install.packages("doParallel")
  install.packages("nnls")
  install.packages('Seurat')
  install.packages('Metrics')

Install the latest version of DeMixSC from our GitHub repository:

# install devtools if necessary.
  if (!"devtools" %in% rownames(installed.packages())) {
    install.packages('devtools')
  }

# Install our DeMixSC package using the following command.
  devtools::install_github('wwylab/DeMixSC')

Load required packages and the tutorial data

# Load required packages.
  library(DeMixSC)
## Loading required package: nnls
## Loading required package: preprocessCore
## Loading required package: doParallel
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Loading required package: sva
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
## Loading required package: genefilter
## Loading required package: BiocParallel
  library(Seurat)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)
## Attaching SeuratObject
  library(nnls)
  library(preprocessCore)
  library(doParallel)
  library(sva)

# Before loading the tutorial data, change your working directory to the folder where you saved the data.
# Note: Currently, our data is only open to reviewers.
# To reviewers: Please click the data link and use the password we shared to access our data. Click the 'Data Download Link' under the 'DeMixSC benchmark data' section. You only need to download 'benchmark_data1.RData' and 'reference_data1.RData' (benchmark data in batch-1), and 'ground_truth.RData' (ground truth of the cell-type proportions) to reproduce the results in this tutorial.
# You can change the names accordingly to run DeMixSC on batch-2.
  load("./benchmark_data1.RData")
  load("./reference_data1.RData")

Deconvolution with DeMixSC

Step-1: Adjust the technological discrepancy using the benchmark dataset

  Input = DeMixSC.prep.benchmark(Seurat.obj = reference,  
                                 annotation = "Annotation",
                                 mat.a = bulk_data1,  
                                 mat.b = pseudobulk_data1,
                                 cutoff = 0.05, 
                                 scale.factor = 1e6)
  • DeMixSC Preprocessing Outputs:
    • mat.a.adj: Adjusted relative expression matrix ‘a’, with genes on the rows and samples on the columns.
    • mat.b: Unadjusted relative expression matrix ‘b’, with genes on the rows and samples on the columns.
    • reference.adj: Adjusted reference matrix used for deconvolving mat.a.adj.
    • reference: Unadjusted reference matrix used for deconvolving mat.b.
    • theta.adj: Adjusted relative abundance matrix of reference expression.
    • cell.size: Represents the mean cell size for each cell type.
    • discrepancy.genes: Genes significantly influenced by technological discrepancies.
    • non.discrepancy.genes: Genes minimally impacted by technological discrepancies.

Step-2: Deconvolve the bulk RNA-seq data after adjusting for the technological discrepancy

  res_bulk = DeMixSC.deconvolution(bulk.mat = Input$mat.a.adj, 
                                   ref = Input$reference.adj, print.sample = NULL)
## Note: DeMixSC defaults to using (max.cores-1). Be cautious to avoid system overload. Consider adjusting `nthread` if needed.
## Starting parallel computation...
## Parallel computation completed.
## Deconvolution tasks completed!
# DeMixSC uses parallel computing, so you will see the above messages if everything works correctly.

Step-3: Deconvolve the pseudobulk RNA-seq data using the input matrix

  res_pseudobulk = DeMixSC.deconvolution(bulk.mat = Input$mat.b, 
                                         ref = Input$reference, print.sample = NULL)
## Note: DeMixSC defaults to using (max.cores-1). Be cautious to avoid system overload. Consider adjusting `nthread` if needed.
## Starting parallel computation...
## Parallel computation completed.
## Deconvolution tasks completed!
  • DeMixSC Deconvolution Outputs:
    • cell.type.proportions: A matrix of cell-type proportions, with cell types on the rows and samples on the columns.
    • converge: A vector indicating the number of iterations until convergence for each sample.
    • weights: A matrix of gene weights with genes on the rows and samples on the columns.

Step-4: Extract the estimated cell-type proportions

# Save the estimated cell-type proportions of the bulk data.
  prop.est_bulk = res_bulk$cell.type.proportions
# Save the estimated cell-type proportions of the pseudo-bulk data.
  prop.est_pseudobulk = res_pseudobulk$cell.type.proportions
# Here are the estimated cell-type proportions of the bulk data.
  round(prop.est_bulk, 2)
##      Macular_19_D013 Macular_19_D014 Macular_19_D015 Macular_19_D016
## MG              0.04            0.05            0.07            0.04
## Rod             0.66            0.50            0.38            0.47
## AC              0.06            0.07            0.08            0.09
## BC              0.17            0.27            0.32            0.22
## Cone            0.00            0.01            0.02            0.00
## HC              0.06            0.09            0.12            0.08
## RGC             0.00            0.01            0.02            0.10
# Here are the estimated cell-type proportions of the pseudo-bulk data.
  round(prop.est_pseudobulk, 2) 
##      Macular_19_D013 Macular_19_D014 Macular_19_D015 Macular_19_D016
## MG              0.03            0.07            0.03            0.06
## Rod             0.64            0.54            0.43            0.37
## AC              0.05            0.05            0.08            0.08
## BC              0.27            0.34            0.43            0.41
## Cone            0.00            0.00            0.01            0.01
## HC              0.00            0.00            0.01            0.00
## RGC             0.00            0.00            0.01            0.07
# Here, we compare the estimated proportions with their ground truth.
# We use the root mean squared error (RMSE) to evaluate DeMixSC's performance in deconvolving benchmark data in batch-1.
  library(Metrics)
# load the ground truth.
  load("./ground_truth.RData") 
  cell.order = c("AC","BC","Cone","HC","MG","RGC","Rod")
# The RMSE values in the bulk data (one for each sample).
  for (i in colnames(prop.est_bulk)) {
    print(rmse(Truth_proportion[cell.order,i], prop.est_bulk[cell.order,i]))
  }
## [1] 0.0169058
## [1] 0.05131309
## [1] 0.044452
## [1] 0.04472796
# The RMSE values in the pseudo-bulk data (one for each sample).
  for (i in colnames(prop.est_pseudobulk)) {
    print(rmse(Truth_proportion[cell.order,i], prop.est_pseudobulk[cell.order,i]))
  }
## [1] 0.03557157
## [1] 0.05490345
## [1] 0.05613007
## [1] 0.05937674

License

GNU Affero General Public License v3.0

Connect with us

Please contact Shuai Guo (), Xiaoqian Liu (), and Wenyi Wang () if you encounter any issues when processing this tutorial.

Reference

Guo, S., Liu, X., Cheng, X., Jiang, Y., Ji, S., Liang, Q., … & Wang, W. (2023). DeMixSC: a deconvolution framework that uses single-cell sequencing plus a small benchmark dataset for improved analysis of cell-type ratios in complex tissue samples. bioRxiv, 2023-11.
https://doi.org/10.1101/2023.10.10.561733