Differential Occupancy and
Differential Binding/Marking
Bioinformatics Core Team
Overview
- Why?
- Defining a high confidence peak set.
- The occupancy approach.
- Count based approach.
- Modelling counts with DESeq2.
- The future Differential ChIP.
Why
- ChIP-seq is (very) noisy.
- Antibody specificity in different (or same) user's hand.
- Fragmentation efficiency/times may vary.
- Variable library complexities.
- Sample specific peaks occur often
- Peaks overlap considered as 1bp (by default)
- High confidence peaks occur in majority of samples.
- If you have 2 replicates must be in both.
- Specific peaks occur when majority of replicates in other groups do not contain overlapping peak
- This can be quite stringent.
Occupany analysis
Occupancy analysis
- Provides high confidence peaks for a group.
- Stringent when finding group specific peaks.
- Any presence of peak-like signal would exclude region.
- Ignores any peak calling QC or bias from input or batch effect.
- Differential occupancy analysis can work well when comparing same antibody across different tissue types.
Refining peaks
- Redefining regions
- Redefining summits
Overview
Redefining region
- Overlapping peaks across multiple samples can artificaly widen a peak.
- Resizing the region of peak can be performed after occupancy analysis.
Resizing a peak region
- Take merged size all peaks
- Centre is the geometric centre.
Option A
Resizing a peak region
- Take the geometric centre of all summits
- Resize around the geometric centre.
- Resize to size of expected motif.
Option B
Resizing a peak region
- Take the geometric centre of all summits
- Resize around the geometric centre.
- Resize to size of peaks.
Option C
Resizing a peak region
- Take the geometric centre of all summits
- Resize around the geometric centre.
- Resize to size of defined fixed size.
Option D
Resizing a peak region
- Take the weighted mean of geometric centre of all summits
- Peaks with smaller summits get less weight.
- Resize around the geometric centre.
- Resize to size of peaks.
- Myc/P53 for instance has multiple classes of binding which vary in length.
Our Choice, Option-E
Longer Marks (Pol2/h3k4me3)
- Often marks are known to occur in genomic locations.
- For longer marks, these locations (genes, promoters,enhancers) may be used as peaks.
- This can avoid a noisy peak calling and speed up first pass analysis.
Counting in Regions
- Once we have decided our region of interest.
- Clean up genome of Blacklisted areas.
- Count reads for ChIP.
- Count reads from Input.
Differential Binding/Marking Analysis
I would not use the input counts at all for differential binding. I would just compare treated counts vs untreated ChIP counts.
but I would also recommend to also take a look at DiffBind and csaw vignettes and workflows, at the least to understand the best practices they've set out.
DESEQ2 author - Mike love
These controls are mostly irrelevant when testing for DB between ChIP samples. However, they can be used to filter out windows where the average abundance across the ChIP samples is below the abundance of the control.
CSAW autor - Arun lun
What about inputs
DESeq2 for Chip-seq
- Deseq2 will identify changes in proportions of counts in regions of interest
- Typically effective library size for ChIP-seq
- But can make a big difference if you expect redistribution to repeats etc not counted.
- DESeq2 can account for Batch or categorical or continuous factors.
- If they are supplied.
Enough theory time for another practical!
Dfferential_Binding
By tom carroll
Dfferential_Binding
- 393