Hi Jeevan,
Is there a reason you're using Chado? It is a much slower option for
driving GBrowse than Bio::DB::SeqFeature::Store. Even if you want "live"
data for one track, I'd suggest setting up a 'Store database for the rest
of the tracks and just get the live data track from Chado.
Also, using fastcgi will make a difference in speed for GBrowse regardless
of what database adaptor you're using, but it makes a big difference for
Chado, because there is some start up overhead that fastcgi caches for you,
so after the first query it's noticeably faster.
That said, there are options for tuning a PostgreSQL database to get better
performance, since it isn't particularly tuned out of the box. Have you
looked at this for instance:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
In the past (ie, pre Postgres 9), I've had lots of success adjusting the
shared_buffers parameter, but there are other tuning parameters as well.
Now about your questions:
1. Are the cpu/ram enough? Probably.
2. While it's possible to delete feature data via GFF, it is slow and not
perfect. Typically, what I do when I need to do something like that is
identify the first feature_id of the "real" data and then delete everything
below it:
DELETE FROM feature WHERE feature_id < ?
That will cause a cascading delete, which will not only delete the feature
rows in the database but all data in linked tables as well. That said, I'm
not sure that the increase in database size is that out of the ordinary:
there are several indexes that get built when you load data, so it will
multiply the disk size by a fair bit (though I don't know off hand how
much). Additionally, the number and size of ontologies you have loaded
will have an effect as well.
3. The long running query will look something like this:
select child.feature_id, child.name, child.type_id, child.uniquename,
parent.name as pname, child.is_obsolete, childloc.fmin, childloc.fmax,
childloc.strand, childloc.locgroup,
childloc.phase, COALESCE(af.significance, af.identity, af.normscore,
af.rawscore) as score, childloc.srcfeature_id
from feature as parent
inner join
feature_relationship as fr0 on
(parent.feature_id = fr0.object_id)
inner join
feature as child on
(child.feature_id = fr0.subject_id)
inner join
featureloc as childloc on
(child.feature_id = childloc.feature_id)
left join
analysisfeature as af on
(child.feature_id = af.feature_id)
inner join featureloc as parentloc on (parent.feature_id =
parentloc.feature_id)
where parent.feature_id = $parent_id
and childloc.rank = 0
and fr0.type_id in ($partof)
and childloc.srcfeature_id = $refclass_feature_id
and parentloc.srcfeature_id = $refclass_feature_id
and child.type_id in ($typelist)
where you'd need to sub in these values:
$part_of: a comma separated list of cvterm_ids of part_of and derived_from
terms in the relationship ontology.
$typelist: a comma separated list of cvterm_ids of the sequence ontology
terms to be searched for.
$parent_id: the feature_id of the parent feature (typically of a gene or
match feature).
$refclass_feature_id: the feature_id of the reference feature (ie, the
chromosome).
This query has to run for each feature that might have subparts, so for a
given view in GBrowse, it could run a hundred times or more. This is the
main reason the Chado adaptor doesn't scale well.
Scott
On Tue, May 13, 2014 at 4:07 AM, Samy Jeevan Karloss Antony <
Post by Samy Jeevan Karloss AntonyHi there,
I have installed GBrowse. I am using Chado and PostGresSQL.
It became very slow after I loaded a GFF3 file with the size of ~80 MB.
Which contains Gene, mRNA, CDS, 3` UTR and 5â UTR.
Post by Samy Jeevan Karloss AntonyIts running on 6 CPUs with 12 GB RAM (test server).
I have few questions.
1. Is that CPU and RAM enough for this data size?
2. Feature table shows that its size is ~1.2GB and the whole database
size is 6GB. I feel that some of my OLD test data would not have been
deleted properly. Is there way to check it? or how to delete all the
instances of test data loaded into the database?
Post by Samy Jeevan Karloss Antony3. What is the actual background Postgres query when you search a gene in
the browser? so that I can check whether my database is slow.
------------------------------------------------------------------------------
Post by Samy Jeevan Karloss Antony"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform
available
Post by Samy Jeevan Karloss AntonySimple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gmod-gbrowse mailing list
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research