Discussion:
[Gmod-gbrowse] "Got a sequence without letters" error
Zhou Albert
2015-08-21 13:30:31 UTC
Permalink
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123
hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and the GFF file includes entries like:

scaffold_1 dpx26mx19 mRNA 173130 190600 816,1521/1899,3.612,3502,14,3585,0 + . ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is?

Many thanks!

Albert
Scott Cain
2015-08-21 14:20:44 UTC
Permalink
Hi Albert,

The problem is with your GFF: the ID attribute is not for assigning names
(identifiers) for use outside of the GFF file--it is only used for
identifying features inside the GFF file to show what features are related
to what other features (like via the Parent attribute). You need the Name
attribute of the GFF feature to match the first string after the ">" in the
fasta file. I'm guessing then in your fasta file, you'd want to call it
"scaffold_1" (though I don't know for sure, because I don't know what the
Name attribute of your example GFF feature is).

Scott
Post by Zhou Albert
Hi everyone,
I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu
14.04 LTS. I have imported GFF and FASTA data into a database using
bp_seqfeature_load, and then connected to GBrowse with
Bio::DB::SeqFeature::Store adaptor.
The features (mRNA, CDS, exons etc.) are displayed correctly. However when
I try to show the DNA sequences, the track remains empty, and I find
http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123
<http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123>*
I’m pretty sure the FASTA files have been loaded into the database (I can
see the "sequence text" values in the “sequence” table and they are not
*>hxAUG26up1s1g18t1
loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585*
*ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA*
*TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA*
...
*scaffold_1 dpx26mx19 mRNA 173130 190600
816,1521/1899,3.612,3502,14,3585,0 + .
ID=hxAUG26up1s1g18t1;(and some more attributes here)*
*...*
It seems to me that the GBrowse should be able to link them together, and
show the sequences correctly. Could someone tell me where the problem is?
Many thanks!
Albert
------------------------------------------------------------------------------
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
Zhou Albert
2015-08-26 08:44:59 UTC
Permalink
Hi Scott,

Thanks for the advice. I have been working on it in the last few days, and managed to convert all the Name attributes to match the FASTA tags. For instance, now I have:

scaffold_1 dpx26mx19 mRNA 1359700 1364615 106,1149/2469,2.111,2395,0,4050,0 + . ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;Name=hxNCBI_GNO_546014

in my GFF file, and
hxNCBI_GNO_546014 loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC
¡­

in the FASTA file. However the problem remains (exactly the same as the original one).

Any idea what the problem is?

Thanks!
Albert
Hi Albert,
The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute). You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file. I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).
Scott
Hi everyone,
I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.
MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123 <http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123>
hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...
scaffold_1 dpx26mx19 mRNA 173130 190600 816,1521/1899,3.612,3502,14,3585,0 + . ID=hxAUG26up1s1g18t1;(and some more attributes here)
...
It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is?
Many thanks!
Albert
------------------------------------------------------------------------------
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse <https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087
Ontario Institute for Cancer Research
Scott Cain
2015-08-26 15:32:10 UTC
Permalink
Hi Albert,

You've exchanged one problem with your GFF with another: the name of the
feature in the first column has to be the same as the name of the feature
in the ninth column with it's a reference feature like a chromosome or
scaffold, and now that I'm looking more closely at what you're trying to
do, I see that it won't work: GBrowse only knows how to work with FASTA
sequence that is for the reference sequence, and you are trying to provide
a sequence that is only for part of the reference. What you really need is
the FASTA for scaffold_1.

Scott
Post by Zhou Albert
Hi Scott,
Thanks for the advice. I have been working on it in the last few days, and
managed to convert all the Name attributes to match the FASTA tags. For
scaffold_1 dpx26mx19 mRNA 1359700 1364615
106,1149/2469,2.111,2395,0,4050,0 + .
ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;
*Name=hxNCBI_GNO_546014*
in my GFF file, and
*hxNCBI_GNO_546014*
loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC


in the FASTA file. However the problem remains (exactly the same as the original one).
Any idea what the problem is?
Thanks!
Albert
Hi Albert,
The problem is with your GFF: the ID attribute is not for assigning names
(identifiers) for use outside of the GFF file--it is only used for
identifying features inside the GFF file to show what features are related
to what other features (like via the Parent attribute). You need the Name
attribute of the GFF feature to match the first string after the ">" in the
fasta file. I'm guessing then in your fasta file, you'd want to call it
"scaffold_1" (though I don't know for sure, because I don't know what the
Name attribute of your example GFF feature is).
Scott
Hi everyone,
I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu
14.04 LTS. I have imported GFF and FASTA data into a database using
bp_seqfeature_load, and then connected to GBrowse with
Bio::DB::SeqFeature::Store adaptor.
The features (mRNA, CDS, exons etc.) are displayed correctly. However
when I try to show the DNA sequences, the track remains empty, and I find
http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123
<http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123>*
I’m pretty sure the FASTA files have been loaded into the database (I can
see the "sequence text" values in the “sequence” table and they are not
*>hxAUG26up1s1g18t1
loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585*
*ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA*
*TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA*
...
*scaffold_1 dpx26mx19 mRNA 173130 190600
816,1521/1899,3.612,3502,14,3585,0 + .
ID=hxAUG26up1s1g18t1;(and some more attributes here)*
*...*
It seems to me that the GBrowse should be able to link them together, and
show the sequences correctly. Could someone tell me where the problem is?
Many thanks!
Albert
------------------------------------------------------------------------------
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
Zhou Albert
2015-08-26 21:28:15 UTC
Permalink
Hi Scott,

I have modified the FASTA files, and everything work fine now.

Many thanks!
Albert
Post by Scott Cain
Hi Albert,
You've exchanged one problem with your GFF with another: the name of the feature in the first column has to be the same as the name of the feature in the ninth column with it's a reference feature like a chromosome or scaffold, and now that I'm looking more closely at what you're trying to do, I see that it won't work: GBrowse only knows how to work with FASTA sequence that is for the reference sequence, and you are trying to provide a sequence that is only for part of the reference. What you really need is the FASTA for scaffold_1.
Scott
Hi Scott,
scaffold_1 dpx26mx19 mRNA 1359700 1364615 106,1149/2469,2.111,2395,0,4050,0 + . ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;Name=hxNCBI_GNO_546014
in my GFF file, and
hxNCBI_GNO_546014 loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC
¡­
in the FASTA file. However the problem remains (exactly the same as the original one).
Any idea what the problem is?
Thanks!
Albert
Hi Albert,
The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute). You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file. I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).
Scott
Hi everyone,
I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.
MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123 <http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123>
hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...
scaffold_1 dpx26mx19 mRNA 173130 190600 816,1521/1899,3.612,3502,14,3585,0 + . ID=hxAUG26up1s1g18t1;(and some more attributes here)
...
It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is?
Many thanks!
Albert
------------------------------------------------------------------------------
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse <https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087 <tel:216-392-3087>
Ontario Institute for Cancer Research
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087
Ontario Institute for Cancer Research
Loading...