Preparing pileup

If you are looking for assistance to load your BAM file, see the short read alignment preparation page

Please see the description of the pile-up track for more information on what can be done with the pile-up track.

There are three formats supported for pileups. The first one is generated with a specific tool that is available from this page. The second one can be generated by samtools, the final one is a simple tab delimited file format. All are explained below, links to samtools and tabix can be found at the bottom of this page.

Important: TDF should not be indexed. The samtools pileup and tab delimited format MUST be indexed before GenomeView understands them.


TDF coverage plot (recommended, coverage only)

TDF is a tiled data format which contains the coverage plot, as well as multiple resolution summaries which allows fast retrieval at any scale.

Download the latest version of tdformat, a small program to generate TDF files from BAM files. The BAM file has to be indexed, i.e. there has to be a BAI file as well.

Once you've downloaded and extracted the program (you need at least the lib folder and the tdformat jar file) you can invoke it with the following commands:

java -Xmx1g -jar tdformat-1576.jar <path to your BAM file>

Replace 1576 with the version number of the file you downloaded.

For large genomes, mammalian genomes for example, you may need to increase the memory allotment for the program:

java -Xmx4g -jar tdformat-1523.jar <path to your BAM file>

The TDF format does not have to be indexed.


SAMTools pileup (includes diversity information, i.e. SNP track)

Note: file name extension should contain .pileup
The first step to be able to browse a pileup is to generate one from your BAM file.

samtools pileup -f reference.fasta sorted.bam >sorted.pileup

As you run this command, you'll see that the generated file can be huge, even for small BAM files.

To be able to browse it in GenomeView, it needs to be indexed with tabix, a tool that is also available from the SAMtools web page.


sort -k1,1 -k2,2n sorted.pileup | bgzip -c > compressed.pileup.bgz
tabix -s 1 -b 2 -e 2 compressed.pileup.bgz

Tab delimited pileup (extension should contain '.swig')

The file should be organized in four columns.
The first column holds the identifier of the sequence, the second column contains the genomic position, the third column contains the number of reads on the forward strand, the final column contains the number of reads on the reverse strand.

  1. Identifier
  2. Genomic position (one-based)
  3. # forward reads
  4. # reverse reads

Example:

chr1 11 46 43
chr1 12 47 50
chr1 13 48 61
chr1 14 53 79

Note that the white-space between the columns are tabs, one tab between each column.

Once you have such a file, you can again index it for faster access and shorter download times.

sort -T . -k1,1 -k2,2n filename | bgzip -c > filename.bgz
tabix -s 1 -b 2 -e 2 filename.bgz

Resources

Download samtools
Download tabix