‘Batch Effect Correction of RNA-seq Data through Sample Distance Matrix Adjustment’

Paper sharing

Posted by Xuan on February 18, 2020

Paper sharing

scBatch: Batch Effect Correction of RNA-seq Data through Sample Distance Matrix Adjustment

papar info: 2020; published in Bioinformatics

Novelty

  • unfinish
  • not seq error correction, it is the correction for gene expression counts

Brief introduction

1.Batch effect: batch means a number of the factors (including different laboratories, different sample preparation batches, different sequencing…)

2.Existing Methods:

  • establish linear models of gene expression with biolofical groups (e.g. disease and control group)
    • DE gene analysis package (Limma).
      2017- scPLS
  • empirical Bayes algorithm (e.g. ComBat)
    • an improved version 2020-ComBat-seq (by nagative binomial regression)
  • consensus clustering method (SC3: conduct clustering analysis based on multiple distance metrics)
    • BatchEffectRemovel.

3.Limitation of linear methods: require the knowledge of biological groups for each observation, which is hardly feasible.

4.spike-in genes?

spike-in方法,指在RNA-seq建库过程中参入一些预先知道序列信息以及序列绝对数量的内参,这样在进行RNA-Seq测序的时候就可以通过不同样本之间内参(spike-in)的量来做一条标准曲线,就可以非常准确地对不同样本之间的表达量进行矫正。
常用的spike-in类型:ERCC Control RNA (External RNA Controls Consortium).

5.workflow

  • input: gene expression counts matrix

6.Dataset

  • simulated data : (PROPER for data simulation)
  • real data