Hi,
We created a C program for generating bigwig files following the deprecation of Bio::DB::Sam. This has the added advantage that you can select which read flags are used in the coverage calculations:
https://github.com/cancerit/cgpBigWig#bam2bw
You can also use bam2bw in a per-chromosome mode (by sepecifying '--region') and then merge the data with bwjoin for faster turn around. This is simplified using bamTobw.pl<https://github.com/cancerit/PCAP-core/wiki/Scripts-Generic#bamtobwpl> (part of PCAP-core<https://github.com/cancerit/PCAP-core>) to handle the multiple threads and merging for you but that is a much larger install.
Regards,
Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute
***@sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 4983
Office: H104
On 01/06/2017, 00:20, "Timothy Parnell" <***@hci.utah.edu> wrote:
Hi Stefan,
If I navigate to the link you provided below, and dump your bam file, I noticed that virtually all of the alignments are marked as ânot primary alignmentâ (flag bit value of 256 or 272). In fact, only 36 alignments are marked as primary alignments. That can easily account for the sparse coverage in the coverage_xyplot graph.
Evidently the Bio::DB::Sam adapter coverage method employs a low level filter to skip non-primary alignments, which I never realized before. Low level as in this is not the Perl stuff but in either the XS code or the samtools C library.
In IGV, you can filter secondary alignments. It should recreate what you see in GBrowse. Considering Bio::DB::Sam has now been superseded by Bio::DB::HTS (which is not exactly a drop-in replacement), thereâs not much you can do about this except to use bigWig generated by a different program.
Tim
On May 31, 2017, at 10:01 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:
Hi Scott,
thank you for the suggestion. IGV does show the correct and expected coverage.
Zooming in GBrowse does not change anything.
You can see the result at https://vm19002.virt.gwdg.de/gb2/gbrowse/tribolium/?name=id:469064
If it is helpful, we could also upload the bam file.
samtools view wrong_coverage.bam returns lines like the following. The position/length integers in the 8th/9th column remain 0 throughout the entire file, which is not the case for correct_coverage.bam. Could this be the cause?
ILLUMINA-D0898A_0001:1:94:15808:4394#0/1 0 NW_015452213.1 1447 40 38M * 0 0 TGTGGTACTCGTGTACACAGCAGAAGAATTATCAGTAT bbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
ILLUMINA-D0898A_0001:1:2:16831:1638#0/1 0 NW_015452213.1 1449 40 38M * 0 0 TGGTACTCGTGTACACAGCAGAAGAATTATCAGTATAA ]abbbbbbabaabbbbbbbbbbbbabbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
Best Regards
Stefan
On 2017-05-10 16:51, Scott Cain wrote:
Hi Stefan,
I know that there can be problems associated with sampling the bam data when drawing coverage plots. Have you tried zooming in and out to see if the coverage plot changes to look more "right". Similarly, have you tried opening the bam file in other software like IGV to see if it looks right?
Scott
On Mon, May 8, 2017 at 11:59 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:
Hi,
I am working with two databases in my configuration file and need to display coverage tracks for both of them.
One works fine, the other does not. When using feature = match and glyph = segments however, the correct data is shown.
The track definitions for databases rnaseqall and ovarysam:
#This coverage track works
[READ_COVERAGE]
feature = coverage
glyph = wiggle_xyplot
database = rnaseqall
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage from RNA-Seq
citation = RNA-Seq reads combined from all developmental stages.
category = RNA-Seq
#This track does not work correctly. A scale with and a few non-matching peeks are shown.
[READ_COVERAGE_OVARY]
feature = coverage
glyph = wiggle_xyplot
database = ovarysam
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage ovary
citation = RNA-Seq reads combined from all developmental stages.
category = RNA-Seq
#Correct data is shown
[Reads]
feature = match
glyph = segments
database = ovarysam
draw_target = 1
show_mismatch = 1
mismatch_color = red
bgcolor = royalblue
fgcolor = black
height = 5
label density = 50
feature_limit = 5000
bump = fast
connector = solid
key = RNA-Seq reads match ovary
category = Experimental Data
Is there a method to verify the bam files are correct?
What could be causing the coverage track to be wrong?
Best Regards
Stefan Pfeiffer
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.