Group 4
Bhoomika Agarwal
Srinath Jayaraman
CS4260
Chapter 8
Part 1
Ideal scenario:
Sequence and count all molecules of interest in a sample
IRL:
We lose some molecules or intermediates in the process
Sequence and count a statistical sample
#Load the count data and examine it
fn = system.file("extdata", "pasilla_gene_counts.tsv",
package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))
dim(counts)
counts[ 2000+(0:3), ]
- The matrix tallies the number of reads seen for each gene in each sample. We call it the count table
- 14599 rows, corresponding to the genes, and 7 columns, corresponding to the samples
- raw counts of sequencing reads
These data are from an experiment on Drosophila melanogaster cell cultures that investigated the effect of RNAi knock-down of the splicing factor pasilla (Brooks et al. 2011) on the cells’ transcriptome
2 experimental conditions:
untreated - negative control
treated - siRNA against pasilla
#Load metadata of 7 sample files
annotationFile = system.file("extdata",
"pasilla_sample_annotation.csv",
package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno
Data Wrangling:
#Data wrangling
library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))
DESeqDataSet
# DESeqDataSet
mt = match(colnames(counts), sub("fb$", "", pasillaSampleAnno$file))
stopifnot(!any(is.na(mt)))
library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
countData = counts,
colData = pasillaSampleAnno[mt, ],
design = ~ condition)
class(pasilla)
is(pasilla, "SummarizedExperiment")
#The DESeq2 method
pasilla = DESeq(pasilla)
res = results(pasilla)
res[order(res$padj), ] %>% head
the histogram of p-values
the MA plot
an ordination plot
a heatmap
#p-value histogram
ggplot(as(res, "data.frame"), aes(x = pvalue)) +
geom_histogram(binwidth = 0.01, fill = "Royalblue", boundary = 0)
#MA plot
plotMA(pasilla, ylim = c( -2, 2))
#PCA plot
pas_rlog = rlogTransformation(pasilla)
plotPCA(pas_rlog, intgroup=c("condition", "type")) + coord_fixed()
#Heatmap
library("pheatmap")
select = order(rowMeans(assay(pas_rlog)), decreasing = TRUE)[1:30]
pheatmap( assay(pas_rlog)[select, ],
scale = "row",
annotation_col = as.data.frame(
colData(pas_rlog)[, c("condition", "type")] ))
♡2020 by Bhoomika Agarwal. Copying is an act of love. Love is not subject to law. Please copy.