bamCoverage¶

  • Required arguments
  • Output
  • Optional arguments
  • Read coverage normalization options
  • Read processing options
  • Usage hints
  • Usage example for ChIP-seq
  • Usage examples for RNA-seq
    • Regular bigWig track
    • Split up tracks for each strand
      • Versions before two.2

../../_images/norm_IGVsnapshot_indFiles.png

If y'all are non familiar with BAM, bedGraph and bigWig formats, you lot can read up on that in our Glossary of NGS terms

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) every bit output. The coverage is calculated every bit the number of reads per bin, where bins are curt sequent counting windows of a defined size. Information technology is possible to extended the length of the reads to meliorate reflect the actual fragment length. bamCoverage offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).

                usage: An case usage is:$ bamCoverage -b reads.bam -o coverage.bw              

Required arguments¶

--bam, -b BAM file to process

Output¶

--outFileName, -o
Output file name.
--outFileFormat, -of

Possible choices: bigwig, bedgraph

Output file blazon. Either "bigwig" or "bedgraph".

Optional arguments¶

--scaleFactor The computed scaling factor (or ane, if not applicable) will be multiplied by this. (Default: 1.0)
--MNase Decide nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are divers by the two mate reads. Merely fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts. By default, whatever fragments smaller or larger than this are ignored. To over-ride this, use the –minFragmentLength and –maxFragmentLength options, which volition default to 130 and 200 if not otherwise specified in the presence of –MNase. Note: Requires paired-end data. A bin size of 1 is recommended.
--First Uses this offset inside of each read as the betoken. This is useful in cases like RiboSeq or GROseq, where the signal is 12, xv or 0 bases past the start of the read. This tin be paired with the –filterRNAstrand option. Note that negative values indicate offsets from the stop of each read. A value of 1 indicates the first base of the alignment (taking alignment orientation into business relationship). Too, a value of -one is the last base of the alignment. An beginning of 0 is not permitted. If 2 values are specified, so they will be used to specify a range of positions. Note that specifying something like –First 5 -one will result in the 5th through final position being used, which is equivalent to trimming four bases from the 5-prime stop of alignments. Note that if you specify –centerReads, the centering will be performed before the offset.
--filterRNAstrand

Possible choices: forwards, reverse

Selects RNA-seq reads (single-end or paired-finish) originating from genes on the given strand. This pick assumes a standard dUTP-based library grooming (that is, –filterRNAstrand=forward keeps minus-strand reads, which originally came from genes on the forrad strand using a dUTP-based method). Consider using –samExcludeFlag instead for filtering past strand in other contexts.

--version show program's version number and leave
--binSize, -bs Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)
--region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:finish, for example –region chr10 or –region chr10:456700:891000.
--blackListFileName, -bl
A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, and then the read/fragment might still be considered. Please note that yous should arrange the effective genome size, if relevant.
--numberOfProcessors, -p
Number of processors to use. Type "max/two" to employ one-half the maximum number of processors or "max" to utilise all available processors. (Default: 1)
--verbose, -5 Set to see processing letters.

Read coverage normalization options¶

--effectiveGenomeSize
The constructive genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to exist adjusted accordingly. A table of values is available here: http://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html .
--normalizeUsing

Possible choices: RPKM, CPM, BPM, RPGC, None

Use one of the entered methods to normalize the number of reads per bin. Past default, no normalization is performed. RPKM = Reads Per Kilobase per Million mapped reads; CPM = Counts Per Million mapped reads, same every bit CPM in RNA-seq; BPM = Bins Per Million mapped reads, aforementioned equally TPM in RNA-seq; RPGC = reads per genomic content (1x normalization); Mapped reads are considered later on blacklist filtering (if applied). RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)). CPM (per bin) = number of reads per bin / number of mapped reads (in millions). BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions). RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. None = the default and equivalent to not setting this selection at all. This scaling factor, in plough, is determined from the sequencing depth: (full number of mapped reads * fragment length) / constructive genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires –effectiveGenomeSize. Each read is considered independently, if you want to but count one mate from a pair in paired-end data, then use the –samFlagInclude/–samFlagExclude options. (Default: None)

--exactScaling Instead of computing scaling factors based on a sampling of the reads, procedure all of the reads to determine the exact number that will exist used in the output. This requires significantly more time to compute, but will produce more accurate scaling factors in cases where alignments that are being filtered are rare and lumped together. In other words, this is only needed when region-based sampling is expected to produce incorrect results.
--ignoreForNormalization, -ignore
A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is –ignoreForNormalization chrX chrM.
--skipNonCoveredRegions, --skipNAs
This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zippo. The conclusion to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for instance, repetitive regions that should be skipped.
--smoothLength The smoothen length defines a window, larger than the binSize, to average the number of reads. For instance, if the –binSize is set to 20 and the –smoothLength is set to 60, then, for each bin, the average of the bin and its left and correct neighbors is considered. Whatever value smaller than –binSize will exist ignored and no smoothing volition exist applied.

Read processing options¶

--extendReads, -east
This parameter allows the extension of reads to fragment size. If set up, each read is extended, without exception. NOTE: This feature is mostly NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. Single-cease: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length volition non exist extended. Paired-end: Reads with mates are ever extended to friction match the fragment size defined by the two read mates. Unmated reads, mate reads that map likewise far apart (>4x fragment length) or even map to dissimilar chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, information technology is estimated from the data (hateful of the fragment size of all mate reads).
--ignoreDuplicates
If prepare, reads that take the same orientation and start position will exist considered only once. If reads are paired, the mate's position likewise has to coincide to ignore a read.
--minMappingQuality
If set, only reads that have a mapping quality score of at least this are considered.
--centerReads Past adding this option, reads are centered with respect to the fragment length. For paired-end information, the read is centered at the fragment length divers by the two ends of the fragment. For single-end data, the given fragment length is used. This pick is useful to become a sharper signal effectually enriched regions.
--samFlagInclude
Include reads based on the SAM flag. For instance, to get simply reads that are the commencement mate, apply a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (Default: None)
--samFlagExclude
Exclude reads based on the SAM flag. For example, to go only reads that map to the forrard strand, use –samFlagExclude xvi, where 16 is the SAM flag for reads that map to the reverse strand. (Default: None)
--minFragmentLength
The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
--maxFragmentLength
The maximum fragment length needed for read/pair inclusion. (Default: 0)

Usage hints¶

  • A smaller bin size value volition result in a college resolution of the coverage track but also in a larger file size.
  • The 1x normalization (RPGC) requires the input of a value for the effective genome size, which is the mappable office of the reference genome. Of course, this value is species-specific. The command line help of this tool offers suggestions for a number of model species.
  • It might be useful for some studies to exclude certain chromosomes in guild to avoid biases, east.g. chromosome X, every bit male mice contain a pair of each autosome, but usually only a single X chromosome.
  • Past default, the read length is Non extended! This is the preferred setting for spliced-read information like RNA-seq, where ane usually wants to rely on the detected read locations but. A read extension would neglect potential splice sites in the unmapped part of the fragment. Other data, eastward.g. Chip-seq, where fragments are known to map contiguously, should be candy with read extension ( --extendReads [INTEGER] ).
  • For paired-end data, the fragment length is generally defined by the two read mates. The user provided fragment length is only used as a fallback for singletons or mate reads that map too far apart (with a distance greater than four times the fragment length or are located on different chromosomes).

Warning

If you already normalized for GC bias using correctGCbias , you should admittedly NOT set the parameter --ignoreDuplicates !

Note

Like BAM files, bigWig files are compressed, binary files. If you would like to come across the coverage values, cull the bedGraph output via --outFileFormat .

Usage case for Chip-seq¶

This is an example for ChIP-seq data using additional options (smaller bin size for higher resolution, normalizing coverage to 1x mouse genome size, excluding chromosome X during the normalization step, and extending reads):

                                    bamCoverage                  --                  bam                  a                  .                  bam                  -                  o                  a                  .                  SeqDepthNorm                  .                  bw                  \                  --                  binSize                  10                  --                  normalizeUsing                  RPGC                  --                  effectiveGenomeSize                  2150570000                  --                  ignoreForNormalization                  chrX                  --                  extendReads                

If you had run the command with --outFileFormat bedgraph , you lot could easily peak into the resulting file.

                  $ head SeqDepthNorm_chr19.bedgraph 19  60150   60250   9.32 19  60250   60450   18.65 xix  60450   60650   27.97 xix  60650   60950   37.29 19  60950   61000   27.97 xix  61000   61050   18.65 19  61050   61150   27.97 19  61150   61200   18.65 19  61200   61300   9.32 nineteen  61300   61350   18.65                

As you can run across, each row corresponds to one region. If consecutive bins have the aforementioned number of reads overlapping, they will be merged.

Usage examples for RNA-seq¶

Annotation that some BAM files are filtered based on SAM flags (Explain SAM flags).

Regular bigWig track¶

                                        bamCoverage                    -                    b                    a                    .                    bam                    -                    o                    a                    .                    bw                  

Separate tracks for each strand¶

Sometimes it makes sense to generate two contained bigWig files for all reads on the forward and reverse strand, respectively. As of deepTools version 2.2, ane can simply utilize the --filterRNAstrand pick, such as --filterRNAstrand forward or --filterRNAstrand opposite . This handles paired-cease and single-end datasets. For older versions of deepTools, please see the instructions below.

Note

The --filterRNAstrand choice assumes the sequencing library generated from ILLUMINA dUTP/NSR/NNSR methods, which are the most commonly used method for library training, where Read ii (R2) is in the direction of RNA strand (opposite-stranded library). Withal other methods exist, which generate read R1 in the direction of RNA strand (see this review). For these libraries, --filterRNAstrand will take an opposite behavior, i.e. --filterRNAstrand forward will give you reverse strand signal and vice-versa.

Versions before 2.2¶

To follow the examples, you need to know that -f will tell samtools view to include reads with the indicated flag, while -F will pb to the exclusion of reads with the respective flag.

For a stranded `single-end` library

                                            # Forward strand                      bamCoverage                      -                      b                      a                      .                      bam                      -                      o                      a                      .                      fwd                      .                      bw                      --                      samFlagExclude                      16                      # Reverse strand                      bamCoverage                      -                      b                      a                      .                      bam                      -                      o                      a                      .                      rev                      .                      bw                      --                      samFlagInclude                      16                    

For a stranded `paired-end` library

Now, this gets a scrap cumbersome, but futurity releases of deepTools will make this more direct-forward. For at present, bear with us and perhaps read up on SAM flags, eastward.thousand. hither.

For paired-end samples, we assume that a proper pair should have the mates on opposing strands where the Illumina strand-specific protocol produces reads in a R2-R1 orientation. We basically follow the recipe given in this biostars tutorial.

To get the file for transcripts that originated from the forward strand:

                      # include reads that are second in a pair (128); # exclude reads that are mapped to the reverse strand (16) $ samtools view -b -f 128 -F sixteen a.bam > a.fwd1.bam  # exclude reads that are mapped to the reverse strand (sixteen) and # first in a pair (64): 64 + xvi = 80 $ samtools view -b -f fourscore a.bam > a.fwd2.bam  # combine the temporary files $ samtools merge -f fwd.bam a.fwd1.bam a.fwd2.bam  # index the filtered BAM file $ samtools index fwd.bam  # run bamCoverage $ bamCoverage -b fwd.bam -o a.fwd.bigWig  # remove the temporary files $ rm a.fwd*.bam                    

To go the file for transcripts that originated from the reverse strand:

                      # include reads that map to the opposite strand (128) # and are second in a pair (sixteen): 128 + xvi = 144 $ samtools view -b -f 144 a.bam > a.rev1.bam  # include reads that are showtime in a pair (64), only # exclude those ones that map to the contrary strand (16) $ samtools view -b -f 64 -F 16 a.bam > a.rev2.bam  # merge the temporary files $ samtools merge -f rev.bam a.rev1.bam a.rev2.bam  # index the merged, filtered BAM file $ samtools index rev.bam  # run bamCoverage $ bamCoverage -b rev.bam -o a.rev.bw  # remove temporary files $ rm a.rev*.bam                    
deepTools Galaxy. code @ github.