23  Proteomic analysis with DEP

Author
Affiliation

Dr Randy Johnson

Hood College

Published

October 15, 2025

Today we will be exploring a proteomics dataset characterizing Ubiquitin-protein interactors (Cox and Mann 2008; Zhang et al. 2018).

# data analysis follows tutorial at
# https://bioconductor.org/packages/release/bioc/vignettes/DEP/inst/doc/DEP.html
library(DEP)
library(dplyr)

# The data is provided with the package
data <- UbiLength

# We filter for contaminant proteins and decoy database hits, which are indicated by "+" in the columns "Potential.contaminants" and "Reverse", respectively. 
data <- filter(data, Reverse != "+", Potential.contaminant != "+")

# Are there any duplicated gene names?
data$Gene.names %>% duplicated() %>% any()
[1] TRUE
# Make a table of duplicated gene names
data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% 
  arrange(desc(frequency)) %>% filter(frequency > 1)
# A tibble: 51 × 2
   Gene.names frequency
   <chr>          <int>
 1 ""                 7
 2 "ATXN2"            4
 3 "ATXN2L"           4
 4 "SF1"              4
 5 "HSPA8"            3
 6 "RBM33"            3
 7 "UGP2"             3
 8 "ACTL6A"           2
 9 "BCLAF1"           2
10 "BRAP"             2
# ℹ 41 more rows
# Make unique names using the annotation in the "Gene.names" column as primary names and the annotation in "Protein.IDs" as name for those that do not have an gene name.
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

# Are there any duplicated names?
data$name %>% duplicated() %>% any()
[1] FALSE
# Generate a SummarizedExperiment object using an experimental design
LFQ_columns <- grep("LFQ.", colnames(data_unique)) # get LFQ column numbers
experimental_design <- UbiLength_ExpDesign
data_se <- make_se(data_unique, LFQ_columns, experimental_design)

# Generate a SummarizedExperiment object by parsing condition information from the column names
LFQ_columns <- grep("LFQ.", colnames(data_unique)) # get LFQ column numbers
data_se_parsed <- make_se_parse(data_unique, LFQ_columns)

# Filter for proteins that are identified in all replicates of at least one condition
data_filt <- filter_missval(data_se, thr = 0)

# Less stringent filtering:
# Filter for proteins that are identified in 2 out of 3 replicates of at least one condition
data_filt2 <- filter_missval(data_se, thr = 1)

plot_numbers(data_filt)

plot_coverage(data_filt)

# Normalize the data
data_norm <- normalize_vsn(data_filt)

# Visualize normalization by boxplots for all samples before and after normalization
plot_normalization(data_filt, data_norm)

# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)
[1] 0.2608395
# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)

Differential enrichment analysis based on linear models and empherical Bayes statistics

# Test every sample versus control
data_diff <- test_diff(data_imp, type = "control", control = "Ctrl")

# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all")

# Test manually defined comparisons
data_diff_manual <- test_diff(data_imp, type = "manual", 
                              test = c("Ubi4_vs_Ctrl", "Ubi6_vs_Ctrl"))

# Denote significant proteins based on user defined cutoffs
dep <- add_rejections(data_diff, alpha = 0.05, lfc = log2(1.5))

# Plot the first and second principal components
plot_pca(dep, x = 1, y = 2, n = 500, point_size = 4)

# Plot the Pearson correlation matrix
plot_cor(dep, significant = TRUE, lower = 0, upper = 1, pal = "Reds")

# Plot a heatmap of all significant proteins with the data centered per protein
plot_heatmap(dep, type = "centered", kmeans = TRUE, 
             k = 6, col_limit = 4, show_row_names = FALSE,
             indicate = c("condition", "replicate"))

# Plot a volcano plot for the contrast "Ubi6 vs Ctrl""
plot_volcano(dep, contrast = "Ubi6_vs_Ctrl", label_size = 2, add_names = TRUE)

Make your own volcano plot

# save results
get_results(dep) |>
    write.csv(file = 'data/prot/UbiLength.csv')
  • Download the results from our differential enrichment analysis here.
  • Load the VolcaNoseR Shiny app.
  • Explore the app and make your own volcano plot.
  • Homework: submit your volcano plot on Blackboard.
Cox, Jürgen, and Matthias Mann. 2008. MaxQuant Enables High Peptide Identification Rates, Individualized p.p.b.-Range Mass Accuracies and Proteome-Wide Protein Quantification.” Nature Biotechnology 26 (12): 1367–72. https://doi.org/10.1038/nbt.1511.
Zhang, Xiaofei, Arne H Smits, Gabrielle Ba Van Tilburg, Huib Ovaa, Wolfgang Huber, and Michiel Vermeulen. 2018. “Proteome-Wide Identification of Ubiquitin Interactions Using UbIA-MS.” Nature Protocols 13 (3): 530–50. https://doi.org/10.1038/nprot.2017.147.