[Gmod-gbrowse] Coverage track not working

Discussion:

Stefan Pfeiffer

2017-05-08 15:59:16 UTC

Hi,

I am working with two databases in my configuration file and need to
display coverage tracks for both of them.
One works fine, the other does not. When using feature = match and glyph
= segments however, the correct data is shown.

The track definitions for databases rnaseqall and ovarysam:

#This coverage track works
[READ_COVERAGE]
feature = coverage
glyph = wiggle_xyplot
database = rnaseqall
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage from RNA-Seq
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq

#This track does not work correctly. A scale with and a few non-matching
peeks are shown.
[READ_COVERAGE_OVARY]
feature = coverage
glyph = wiggle_xyplot
database = ovarysam
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage ovary
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq

#Correct data is shown
[Reads]
feature = match
glyph = segments
database = ovarysam
draw_target = 1
show_mismatch = 1
mismatch_color = red
bgcolor = royalblue
fgcolor = black
height = 5
label density = 50
feature_limit = 5000
bump = fast
connector = solid
key = RNA-Seq reads match ovary
category = Experimental Data

Is there a method to verify the bam files are correct?
What could be causing the coverage track to be wrong?

Best Regards
Stefan Pfeiffer

Scott Cain

2017-05-10 14:51:04 UTC

Permalink

Hi Stefan,

I know that there can be problems associated with sampling the bam data
when drawing coverage plots. Have you tried zooming in and out to see if
the coverage plot changes to look more "right". Similarly, have you tried
opening the bam file in other software like IGV to see if it looks right?

Scott

On Mon, May 8, 2017 at 11:59 AM, Stefan Pfeiffer <

Post by Stefan Pfeiffer
Hi,
I am working with two databases in my configuration file and need to
display coverage tracks for both of them.
One works fine, the other does not. When using feature = match and glyph =
segments however, the correct data is shown.
#This coverage track works
[READ_COVERAGE]
feature = coverage
glyph = wiggle_xyplot
database = rnaseqall
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage from RNA-Seq
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq
#This track does not work correctly. A scale with and a few non-matching
peeks are shown.
[READ_COVERAGE_OVARY]
feature = coverage
glyph = wiggle_xyplot
database = ovarysam
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage ovary
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq
#Correct data is shown
[Reads]
feature = match
glyph = segments
database = ovarysam
draw_target = 1
show_mismatch = 1
mismatch_color = red
bgcolor = royalblue
fgcolor = black
height = 5
label density = 50
feature_limit = 5000
bump = fast
connector = solid
key = RNA-Seq reads match ovary
category = Experimental Data
Is there a method to verify the bam files are correct?
What could be causing the coverage track to be wrong?
Best Regards
Stefan Pfeiffer
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research

Scott Cain

2017-05-31 17:00:48 UTC

Permalink

Hi Stefan,

Unfortunately, I don't have anything more to offer--my guess is that you
are just particularly unlucky :-/

Have you considered making a bigwig of the coverage data? In general,
bigwig data work better and are more compact (though you don't get
individual read data, so if you need that, then you'd have both the BAM and
the bigwig file).

Scott

On Wed, May 31, 2017 at 12:01 PM, Stefan Pfeiffer <

Hi Scott,
thank you for the suggestion. IGV does show the correct and expected
coverage.
Zooming in GBrowse does not change anything.
You can see the result at https://vm19002.virt.gwdg.de/g
b2/gbrowse/tribolium/?name=id:469064
If it is helpful, we could also upload the bam file.
samtools view wrong_coverage.bam returns lines like the following. The
position/length integers in the 8th/9th column remain 0 throughout the
entire file, which is not the case for correct_coverage.bam. Could this be
the cause?
ILLUMINA-D0898A_0001:1:94:15808:4394#0/1 0 NW_015452213.1
1447 40 38M * 0 0
TGTGGTACTCGTGTACACAGCAGAAGAATTATCAGTAT bbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbb
MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
ILLUMINA-D0898A_0001:1:2:16831:1638#0/1 0 NW_015452213.1 1449
40 38M * 0 0 TGGTACTCGTGTACACAGCAGAAGAATTATCAGTATAA
]abbbbbbabaabbbbbbbbbbbbabbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0
SM:i:40 XQ:i:40 X2:i:0
Best Regards
Stefan
Hi Stefan,
I know that there can be problems associated with sampling the bam data
when drawing coverage plots. Have you tried zooming in and out to see if
the coverage plot changes to look more "right". Similarly, have you tried
opening the bam file in other software like IGV to see if it looks right?
Scott
On Mon, May 8, 2017 at 11:59 AM, Stefan Pfeiffer <

Post by Stefan Pfeiffer
Hi,
I am working with two databases in my configuration file and need to
display coverage tracks for both of them.
One works fine, the other does not. When using feature = match and glyph
= segments however, the correct data is shown.
#This coverage track works
[READ_COVERAGE]
feature = coverage
glyph = wiggle_xyplot
database = rnaseqall
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage from RNA-Seq
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq
#This track does not work correctly. A scale with and a few non-matching
peeks are shown.
[READ_COVERAGE_OVARY]
feature = coverage
glyph = wiggle_xyplot
database = ovarysam
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage ovary
citation = RNA-Seq reads combined from all developmental
stages.
category = RNA-Seq
#Correct data is shown
[Reads]
feature = match
glyph = segments
database = ovarysam
draw_target = 1
show_mismatch = 1
mismatch_color = red
bgcolor = royalblue
fgcolor = black
height = 5
label density = 50
feature_limit = 5000
bump = fast
connector = solid
key = RNA-Seq reads match ovary
category = Experimental Data
Is there a method to verify the bam files are correct?
What could be causing the coverage track to be wrong?
Best Regards
Stefan Pfeiffer
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain
dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
<(216)%20392-3087>
Ontario Institute for Cancer Research

Timothy Parnell

2017-05-31 23:20:16 UTC

Permalink

Hi Stefan,

If I navigate to the link you provided below, and dump your bam file, I noticed that virtually all of the alignments are marked as “not primary alignment” (flag bit value of 256 or 272). In fact, only 36 alignments are marked as primary alignments. That can easily account for the sparse coverage in the coverage_xyplot graph.

Evidently the Bio::DB::Sam adapter coverage method employs a low level filter to skip non-primary alignments, which I never realized before. Low level as in this is not the Perl stuff but in either the XS code or the samtools C library.

In IGV, you can filter secondary alignments. It should recreate what you see in GBrowse. Considering Bio::DB::Sam has now been superseded by Bio::DB::HTS (which is not exactly a drop-in replacement), there’s not much you can do about this except to use bigWig generated by a different program.

Tim

On May 31, 2017, at 10:01 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:

Hi Scott,

thank you for the suggestion. IGV does show the correct and expected coverage.
Zooming in GBrowse does not change anything.
You can see the result at https://vm19002.virt.gwdg.de/gb2/gbrowse/tribolium/?name=id:469064

If it is helpful, we could also upload the bam file.
samtools view wrong_coverage.bam returns lines like the following. The position/length integers in the 8th/9th column remain 0 throughout the entire file, which is not the case for correct_coverage.bam. Could this be the cause?

ILLUMINA-D0898A_0001:1:94:15808:4394#0/1 0 NW_015452213.1 1447 40 38M * 0 0 TGTGGTACTCGTGTACACAGCAGAAGAATTATCAGTAT bbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
ILLUMINA-D0898A_0001:1:2:16831:1638#0/1 0 NW_015452213.1 1449 40 38M * 0 0 TGGTACTCGTGTACACAGCAGAAGAATTATCAGTATAA ]abbbbbbabaabbbbbbbbbbbbabbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

Best Regards
Stefan

On 2017-05-10 16:51, Scott Cain wrote:

Hi Stefan,

I know that there can be problems associated with sampling the bam data when drawing coverage plots. Have you tried zooming in and out to see if the coverage plot changes to look more "right". Similarly, have you tried opening the bam file in other software like IGV to see if it looks right?

Scott

On Mon, May 8, 2017 at 11:59 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:

Hi,

I am working with two databases in my configuration file and need to display coverage tracks for both of them.
One works fine, the other does not. When using feature = match and glyph = segments however, the correct data is shown.

The track definitions for databases rnaseqall and ovarysam:

#This coverage track works
[READ_COVERAGE]
feature = coverage
glyph = wiggle_xyplot
database = rnaseqall
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage from RNA-Seq
citation = RNA-Seq reads combined from all developmental stages.
category = RNA-Seq

#This track does not work correctly. A scale with and a few non-matching peeks are shown.
[READ_COVERAGE_OVARY]
feature = coverage
glyph = wiggle_xyplot
database = ovarysam
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = royalblue
neg_color = red
key = Coverage ovary
citation = RNA-Seq reads combined from all developmental stages.
category = RNA-Seq

#Correct data is shown
[Reads]
feature = match
glyph = segments
database = ovarysam
draw_target = 1
show_mismatch = 1
mismatch_color = red
bgcolor = royalblue
fgcolor = black
height = 5
label density = 50
feature_limit = 5000
bump = fast
connector = solid
key = RNA-Seq reads match ovary
category = Experimental Data

Is there a method to verify the bam files are correct?
What could be causing the coverage track to be wrong?

Best Regards
Stefan Pfeiffer

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Keiran Raine

2017-06-01 07:55:58 UTC

Permalink

Hi,

We created a C program for generating bigwig files following the deprecation of Bio::DB::Sam. This has the added advantage that you can select which read flags are used in the coverage calculations:

https://github.com/cancerit/cgpBigWig#bam2bw

You can also use bam2bw in a per-chromosome mode (by sepecifying '--region') and then merge the data with bwjoin for faster turn around. This is simplified using bamTobw.pl<https://github.com/cancerit/PCAP-core/wiki/Scripts-Generic#bamtobwpl> (part of PCAP-core<https://github.com/cancerit/PCAP-core>) to handle the multiple threads and merging for you but that is a much larger install.

Regards,

Keiran Raine

Principal Bioinformatician

Cancer Genome Project

Wellcome Trust Sanger Institute

***@sanger.ac.uk

Tel:+44 (0)1223 834244 Ext: 4983

Office: H104

On 01/06/2017, 00:20, "Timothy Parnell" <***@hci.utah.edu> wrote:

Hi Stefan,

If I navigate to the link you provided below, and dump your bam file, I noticed that virtually all of the alignments are marked as ânot primary alignmentâ (flag bit value of 256 or 272). In fact, only 36 alignments are marked as primary alignments. That can easily account for the sparse coverage in the coverage_xyplot graph.

Evidently the Bio::DB::Sam adapter coverage method employs a low level filter to skip non-primary alignments, which I never realized before. Low level as in this is not the Perl stuff but in either the XS code or the samtools C library.

In IGV, you can filter secondary alignments. It should recreate what you see in GBrowse. Considering Bio::DB::Sam has now been superseded by Bio::DB::HTS (which is not exactly a drop-in replacement), thereâs not much you can do about this except to use bigWig generated by a different program.

Tim

On May 31, 2017, at 10:01 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:

Hi Scott,

thank you for the suggestion. IGV does show the correct and expected coverage.

Zooming in GBrowse does not change anything.

You can see the result at https://vm19002.virt.gwdg.de/gb2/gbrowse/tribolium/?name=id:469064

If it is helpful, we could also upload the bam file.

samtools view wrong_coverage.bam returns lines like the following. The position/length integers in the 8th/9th column remain 0 throughout the entire file, which is not the case for correct_coverage.bam. Could this be the cause?

ILLUMINA-D0898A_0001:1:94:15808:4394#0/1 0 NW_015452213.1 1447 40 38M * 0 0 TGTGGTACTCGTGTACACAGCAGAAGAATTATCAGTAT bbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

ILLUMINA-D0898A_0001:1:2:16831:1638#0/1 0 NW_015452213.1 1449 40 38M * 0 0 TGGTACTCGTGTACACAGCAGAAGAATTATCAGTATAA ]abbbbbbabaabbbbbbbbbbbbabbbbbbbbbbbbb MD:Z:38 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0

Best Regards

Stefan

On 2017-05-10 16:51, Scott Cain wrote:

Hi Stefan,

I know that there can be problems associated with sampling the bam data when drawing coverage plots. Have you tried zooming in and out to see if the coverage plot changes to look more "right". Similarly, have you tried opening the bam file in other software like IGV to see if it looks right?

Scott

On Mon, May 8, 2017 at 11:59 AM, Stefan Pfeiffer <***@stud.uni-goettingen.de<mailto:***@stud.uni-goettingen.de>> wrote:

Hi,

I am working with two databases in my configuration file and need to display coverage tracks for both of them.

One works fine, the other does not. When using feature = match and glyph = segments however, the correct data is shown.

The track definitions for databases rnaseqall and ovarysam:

#This coverage track works

[READ_COVERAGE]

feature = coverage

glyph = wiggle_xyplot

database = rnaseqall

height = 50

fgcolor = black

bicolor_pivot = 20

pos_color = royalblue

neg_color = red

key = Coverage from RNA-Seq

citation = RNA-Seq reads combined from all developmental stages.

category = RNA-Seq

#This track does not work correctly. A scale with and a few non-matching peeks are shown.

[READ_COVERAGE_OVARY]

feature = coverage

glyph = wiggle_xyplot

database = ovarysam

height = 50

fgcolor = black

bicolor_pivot = 20

pos_color = royalblue

neg_color = red

key = Coverage ovary

citation = RNA-Seq reads combined from all developmental stages.

category = RNA-Seq

#Correct data is shown

[Reads]

feature = match

glyph = segments

database = ovarysam

draw_target = 1

show_mismatch = 1

mismatch_color = red

bgcolor = royalblue

fgcolor = black

height = 5

label density = 50

feature_limit = 5000

bump = fast

connector = solid

key = RNA-Seq reads match ovary

category = Experimental Data

Is there a method to verify the bam files are correct?

What could be causing the coverage track to be wrong?

Best Regards

Stefan Pfeiffer

------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot

_______________________________________________

Gmod-gbrowse mailing list

Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--

------------------------------------------------------------------------

Scott Cain, Ph. D. scott at scottcain dot net

GMOD Coordinator (http://gmod.org/) 216-392-3087

Ontario Institute for Cancer Research

------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot_______________________________________________

Gmod-gbrowse mailing list

Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________

Gmod-gbrowse mailing list

Gmod-***@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.