[Gmod-gbrowse] Displaying problem of tophat alignment in Gbrowse

Discussion:

Scott Cain

2014-04-04 16:11:16 UTC

Hi Jairui and Jack,

It's best to ask a question like this on the mailing list, since there are
lots of people who work with this everyday and can usually answer faster
than I can. I've cc'ed the list here.

My initial guess is a configuration problem: for example, not getting bases
would lead me to believe it's not looking in the right place for the fasta
file. Can you send the section of your configuration file dealing with
setting up this bam file as a database and the track stanza, and while
you're at it, verify that the path specified in the fasta section points at
the appropriate fasta file for your reference sequence.

Scott

Hi Scott,
I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about
displaying tophat alignment on Gbrowse.
I used TopHat to align RNA-seq reads to reference and got a bam file named
"accepted_hits.bam", in which there are reads supporting introns. Those
reads are splitted and aligned to reference, so in the bam file they have
CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show
30bps perfect matches, then skip 40 bps in reference sequences, and finally
70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps
First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps,
indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in
reference sequence.
Could you help me with that please?
Many thanks!
Cheers,
Jiarui

Sheldon McKay

2014-04-04 16:30:14 UTC

Permalink

Also check that you loaded the reference sequence into your database, if
you are using a database.

Sheldon

Post by Scott Cain
Hi Jairui and Jack,
It's best to ask a question like this on the mailing list, since there are
lots of people who work with this everyday and can usually answer faster
than I can. I've cc'ed the list here.
My initial guess is a configuration problem: for example, not getting
bases would lead me to believe it's not looking in the right place for the
fasta file. Can you send the section of your configuration file dealing
with setting up this bam file as a database and the track stanza, and while
you're at it, verify that the path specified in the fasta section points at
the appropriate fasta file for your reference sequence.
Scott

Hi Scott,
I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about
displaying tophat alignment on Gbrowse.
I used TopHat to align RNA-seq reads to reference and got a bam file
named "accepted_hits.bam", in which there are reads supporting introns.
Those reads are splitted and aligned to reference, so in the bam file they
have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should
show 30bps perfect matches, then skip 40 bps in reference sequences, and
finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in
First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps,
indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in
reference sequence.
Could you help me with that please?
Many thanks!
Cheers,
Jiarui

--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain
dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Jiarui Li

2014-04-04 18:48:26 UTC

Permalink

Hi Scott,

Thanks for your help. I attached the section of the config file and the test sam file. Also to make my question clear, I send you a slide with screenshot of what I saw in Gbrowse.

BxCHNm3_4

GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...

I think the reference fasta file is fine because the GC content and the sequences are fine in Gbrowse.

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" <***@scottcain.net>
To: "Jiarui Li" <***@sfu.ca>, "Gbrowse (E-mail)" <gmod-***@lists.sourceforge.net>
Cc: "Jack Chen" <***@sfu.ca>, "Jeff Chu" <***@gmail.com>
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse

Hi Jairui and Jack,

It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can. I've cc'ed the list here.

My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file. Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.

Scott

On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li < ***@sfu.ca > wrote:

Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui

Scott Cain

2014-04-04 21:03:00 UTC

Permalink

Hi Jiarui,

I'd like to try to replicate this--could you send me the fasta file for the
reference sequence as well?

Thanks,
Scott

Hi Scott,
Thanks for your help. I attached the section of the config file and the
test sam file. Also to make my question clear, I send you a slide with
screenshot of what I saw in Gbrowse.
I checked the reference fasta file I uploaded to mysql database. Here are

BxCHNm3_4

GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...
I think the reference fasta file is fine because the GC content and the
sequences are fine in Gbrowse.
Cheers,
Jiarui
----- Original Message -----
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse
Hi Jairui and Jack,
It's best to ask a question like this on the mailing list, since there are
lots of people who work with this everyday and can usually answer faster
than I can. I've cc'ed the list here.
My initial guess is a configuration problem: for example, not getting
bases would lead me to believe it's not looking in the right place for the
fasta file. Can you send the section of your configuration file dealing
with setting up this bam file as a database and the track stanza, and while
you're at it, verify that the path specified in the fasta section points at
the appropriate fasta file for your reference sequence.
Scott
Hi Scott,
I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about
displaying tophat alignment on Gbrowse.
I used TopHat to align RNA-seq reads to reference and got a bam file named
"accepted_hits.bam", in which there are reads supporting introns. Those
reads are splitted and aligned to reference, so in the bam file they have
CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show
30bps perfect matches, then skip 40 bps in reference sequences, and finally
70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps
First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.
Could you help me with that please?
Many thanks!
Cheers,
Jiarui
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) 216-392-3087
Ontario Institute for Cancer Research

Jiarui Li

2014-04-05 02:19:04 UTC

Permalink

Hi Scott,

This is the fasta file. Thank you very much!

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" <***@scottcain.net>
To: "Jiarui Li" <***@sfu.ca>
Cc: "Jack Chen" <***@sfu.ca>, "Jeff Chu" <***@gmail.com>, "Gbrowse (E-mail)" <gmod-***@lists.sourceforge.net>
Sent: Friday, 4 April, 2014 14:03:00
Subject: Re: Displaying problem of tophat alignment in Gbrowse

Hi Jiarui,

I'd like to try to replicate this--could you send me the fasta file for the reference sequence as well?

Thanks,
Scott

On Fri, Apr 4, 2014 at 2:48 PM, Jiarui Li < ***@sfu.ca > wrote:

Hi Scott,

Thanks for your help. I attached the section of the config file and the test sam file. Also to make my question clear, I send you a slide with screenshot of what I saw in Gbrowse.

BxCHNm3_4

GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...

I think the reference fasta file is fine because the GC content and the sequences are fine in Gbrowse.

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" < ***@scottcain.net >
To: "Jiarui Li" < ***@sfu.ca >, "Gbrowse (E-mail)" < gmod-***@lists.sourceforge.net >
Cc: "Jack Chen" < ***@sfu.ca >, "Jeff Chu" < ***@gmail.com >
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse

Hi Jairui and Jack,

It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can. I've cc'ed the list here.

My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file. Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.

Scott

On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li < ***@sfu.ca > wrote:

Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) 216-392-3087

Ontario Institute for Cancer Research