10 Differential Gene Expression

Author

Affiliation

Dr Randy Johnson

Hood College

Published

September 24, 2025

Workflow Overview

Section	Description	Key Tools/Methods
Data Wrangling	Import raw gene counts, sample information, and gene annotations.	`edgeR::readDGE`, `DGEList` object
Pre-processing	Filter out non-expressed genes and normalize sample distributions.	Filtering (Low Counts), TMM Normalization
Exploratory Analysis	Visualize sample relationships and check for batch effects.	MDS Plots (PCA)
Differential Expression	Model the data, estimate the mean-variance relationship, and fit linear models.	`limma::voom`, `limma::lmFit`, Empirical Bayes
Interpretation	Summarize results and visualize top findings.	`limma::topTable`, `Glimma` plots

Filtering genes with consistently low expression across all samples
- Focuses on tests with sufficient statistical power
- Reduces the multiple testing burden

Normalize data to account for
- Differences in sequencing depth (library size)
- Compositional biases (e.g. a few highly expressed genes skewing the total count)

Trimmed Mean of M-values (TMM)
- calculates scale factors
- Result: log-fold-changes between samples are centered at zero on average
- edgeR::calcNormFactors

Multi-Dimensional Scaling (MDS) Plot (similar to PCA)
- Visualize the distances/relationships between samples using log2 count per million values
- Samples should cluster tightly by biological group
- Samples should not cluster by batch

Design Matrix: A matrix that codes varialbes to be modelled:
- Experimental groups/conditions (e.g., Basal, LP, ML)
- Any confounding factors (e.g., Batch/Lane)
Contrasts: Specific comparisons of interest, for example:
- “LP vs. Basal”
- “ML vs. LP”

RNA-seq counts tend to be heteroscedastic
- Variance of a gene’s expression level depends on its mean
- e.g. higher mean → higher variance
Linear models assume normally distributed data and constant variance (homoscedasticity)

limma::voom
- Models the mean-variance relationship of the log-counts
- Uses these precision weights in the linear modeling step
- Effectively removes heteroscedasticity and
- Make the data suitable for linear modeling

limma::lmFit
- Fits a linear model for every gene based on the design matrix and voom weights
limma::eBayes
- Moderates the gene-wise variance estimates
- More stable and reliable p-value calculations across all tests

Mean-Difference (MD) Plot
- Plots logFC vs. Average Expression (AveExpr/log-CPM)
- Shows the overall pattern of DE and highlights significant genes
Interactive MD Plot
- Allows users to click, hover, and search for specific genes on the MD plot

Heatmaps
- Visualize the expression levels of the top DE genes across all samples
- Allows visualization of sample clustering
- Shows expression patterns

Law, Charity, Monther Alhamdoosh, Shian Su, Xueyi Dong, Luyi Tian, Gordon K Smyth, and Matthew E Ritchie. 2018. “RNA-Seq Analysis Is Easy as 1-2-3 with Limma, Glimma and edgeR.” Vignette. RNA-Seq Analysis Is Easy as 1-2-3 with Limma, Glimma and edgeR. https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html.