[Gmod-gbrowse] Creating GBrowse database

Discussion:

Vaneet Lotay

2015-03-30 18:06:22 UTC

Hello,

We seem to be having trouble creating a new GBrowse database in MySQL for a brand new track or so it would seem as I'm not sure what exactly is going wrong. We have a FASTA file and GFF3 file, we created a new GBrowse database and followed the tutorial (http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql) except to change user names and database names relevant to us. We used the following command as in the tutorial and it loads successfully (replacing volvox of course):

bp_seqfeature_load.pl -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3

We checked the database after to verify that the sequence and features are loaded properly with matching sequence IDs to link them and it's all there. However when we go to our test server we get a 'Not found' error as if it can't find those tables in the database. Doing some gene searches eventually gets another common error "Chromosome/contig not found". It actually reveals in the detailed message the correct gene name in our attributes column as well as the correct scaffold ID yet it says not defined in the database.

We created a very simplistic subset of the FASTA and GFF3 file containing only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the original files. We created a brand new database and loaded these small files and it still comes up with the same error. Just wondering what we can be doing wrong as we've really changed a lot of small minor things that might be causing this disconnection but to no avail. I attached these small subset files as well as the configuration file and you can visit our test server for this new database here:

http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/

In the context of these new files, after you visit this test server page, you'll notice if you search for LOC which is the first 3 letters of the actual gene name (LOC100489), it comes up with the 'contig not found' message and shows this detail underneath which I stated before indicates that it at least found the GFF3 file and possibly the Chr01 sequence but still won't display them for some reason:

Cannot display LOC100489 because the chromosome/contig named Chr01 is not defined in the database.

Please help if you know what initial steps might be going wrong.

Thanks,

Vaneet

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

Alexey Morozov

2015-03-31 05:37:29 UTC

Permalink

That's because "Chromosome/contig named Chr01 is not defined in the
database". That means you need a separate line declaring all your
chromosomes/scaffolds/whatever have you in GFF3 file. For example, my gff
goes as follows:

#gff-version 3
scaffold00001 newbler contig 1 1348726 . . .
ID=scaffold00001;Name=scaffold00001
#
#The same for all scaffolds
#This is the thing you missed. It goes just like gene definition,
#
scaffold00001 maker gene 56082 56739 . + .
ID=4447-gene;Name=4447-gene
scaffold00001 maker mRNA 56082 56739 . + .
ID=4447;Parent=4447-gene;Name=4447;_AED=0.02;_eAED=0.02;_QI=0|0|0|1|1|1|2|0|184
scaffold00001 maker exon 56082 56220 . + .
ID=4447:exon:0;Parent=4447
scaffold00001 maker exon 56324 56739 . + .
ID=4447:exon:1;Parent=4447
scaffold00001 maker CDS 56082 56220 . + 0
ID=4447:cds;Parent=4447
scaffold00001 maker CDS 56324 56739 . + 2
ID=4447:cds;Parent=4447
#
#Same for all genes on them
#

scaffold00001 length=1348726

Hello,
We seem to be having trouble creating a new GBrowse database in MySQL for
a brand new track or so it would seem as Iâm not sure what exactly is going
wrong. We have a FASTA file and GFF3 file, we created a new GBrowse
database and followed the tutorial (
http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql)
except to change user names and database names relevant to us. We used the
following command as in the tutorial and it loads successfully (replacing
*bp_seqfeature_load.pl <http://bp_seqfeature_load.pl> -c -f -a DBI::mysql
-d volvox volvox_all.fa volvox_all.gff3*
We checked the database after to verify that the sequence and features are
loaded properly with matching sequence IDs to link them and itâs all there.
However when we go to our test server we get a âNot foundâ error as if it
canât find those tables in the database. Doing some gene searches
eventually gets another common error âChromosome/contig not foundâ. It
actually reveals in the detailed message the correct gene name in our
attributes column as well as the correct scaffold ID yet it says not
defined in the database.
We created a very simplistic subset of the FASTA and GFF3 file containing
only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the
original files. We created a brand new database and loaded these small
files and it still comes up with the same error. Just wondering what we
can be doing wrong as weâve really changed a lot of small minor things that
might be causing this disconnection but to no avail. I attached these
small subset files as well as the configuration file and you can visit our
http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/
In the context of these new files, after you visit this test server page,
youâll notice if you search for LOC which is the first 3 letters of the
actual gene name (LOC100489), it comes up with the âcontig not foundâ
message and shows this detail underneath which I stated before indicates
that it at least found the GFF3 file and possibly the Chr01 sequence but
Cannot display LOC100489 because the chromosome/contig named Chr01 is not
defined in the database.
Please help if you know what initial steps might be going wrong.
Thanks,
Vaneet
*Vaneet Lotay*
*Xenbase Bioinformatician*
*724 ICT Building - University of Calgary*
*2500 University Drive NW*
*Calgary AB T2N 1N4*
*CANADA*
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.

Vaneet Lotay

2015-03-31 16:24:18 UTC

Permalink

Thanks Alexey, that worked.

Vaneet

From: Alexey Morozov [mailto:***@gmail.com]
Sent: Monday, March 30, 2015 11:37 PM
To: Vaneet Lotay
Cc: gmod-***@lists.sourceforge.net
Subject: Re: [Gmod-gbrowse] Creating GBrowse database

That's because "Chromosome/contig named Chr01 is not defined in the database". That means you need a separate line declaring all your chromosomes/scaffolds/whatever have you in GFF3 file. For example, my gff goes as follows:

#gff-version 3
scaffold00001 newbler contig 1 1348726 . . . ID=scaffold00001;Name=scaffold00001
#
#The same for all scaffolds
#This is the thing you missed. It goes just like gene definition,
#
scaffold00001 maker gene 56082 56739 . + . ID=4447-gene;Name=4447-gene
scaffold00001 maker mRNA 56082 56739 . + . ID=4447;Parent=4447-gene;Name=4447;_AED=0.02;_eAED=0.02;_QI=0|0|0|1|1|1|2|0|184
scaffold00001 maker exon 56082 56220 . + . ID=4447:exon:0;Parent=4447
scaffold00001 maker exon 56324 56739 . + . ID=4447:exon:1;Parent=4447
scaffold00001 maker CDS 56082 56220 . + 0 ID=4447:cds;Parent=4447
scaffold00001 maker CDS 56324 56739 . + 2 ID=4447:cds;Parent=4447
#
#Same for all genes on them
#

scaffold00001 length=1348726

CTGTTTCATCTCAAAGGTCTTCCTTAATTTTAATCCATGGTGATCCAGGCTCTGGAAAAA
GCACTCTTGTTCAGGCATTTATAGATAAGTTACCTAAATCTGTTTTGTTCGCCGTTGGGA
ATTTCGACCGGCCGAAAAATCATTCTCCCTACTCTGCCTTAGTTGCAGCATCTGATATTC
TTTGCCGTCAGATTATTCGAATGAAGAATTGGGAAGAAATTAGCAAAAACATCAGAGATG
#
#Actual sequences for all the scaffolds
#

Sequences don't really HAVE to be in the same file, but I find it useful to keep all the data in one place. Plus you'll never accidentally upload GFF and FASTA from different versions of assembly and waste time checking why did it suddenly stop working.

2015-03-31 2:06 GMT+08:00 Vaneet Lotay <***@ucalgary.ca<mailto:***@ucalgary.ca>>:
Hello,

We seem to be having trouble creating a new GBrowse database in MySQL for a brand new track or so it would seem as Iâm not sure what exactly is going wrong. We have a FASTA file and GFF3 file, we created a new GBrowse database and followed the tutorial (http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql) except to change user names and database names relevant to us. We used the following command as in the tutorial and it loads successfully (replacing volvox of course):

bp_seqfeature_load.pl<http://bp_seqfeature_load.pl> -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3

We checked the database after to verify that the sequence and features are loaded properly with matching sequence IDs to link them and itâs all there. However when we go to our test server we get a âNot foundâ error as if it canât find those tables in the database. Doing some gene searches eventually gets another common error âChromosome/contig not foundâ. It actually reveals in the detailed message the correct gene name in our attributes column as well as the correct scaffold ID yet it says not defined in the database.

We created a very simplistic subset of the FASTA and GFF3 file containing only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the original files. We created a brand new database and loaded these small files and it still comes up with the same error. Just wondering what we can be doing wrong as weâve really changed a lot of small minor things that might be causing this disconnection but to no avail. I attached these small subset files as well as the configuration file and you can visit our test server for this new database here:

http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/

In the context of these new files, after you visit this test server page, youâll notice if you search for LOC which is the first 3 letters of the actual gene name (LOC100489), it comes up with the âcontig not foundâ message and shows this detail underneath which I stated before indicates that it at least found the GFF3 file and possibly the Chr01 sequence but still wonât display them for some reason:

Cannot display LOC100489 because the chromosome/contig named Chr01 is not defined in the database.

Please help if you know what initial steps might be going wrong.

Thanks,

Vaneet

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
Gmod-***@lists.sourceforge.net<mailto:Gmod-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

--
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.