Sam Hokin
2015-03-04 04:34:39 UTC
Hi, folks, just joined the list after spending a lot of time with GBrowse the past few months. My question: Is there any way to get
search to search on the load_id attribute?
Annotated genes that I get from the Ensembl plant database (I'm working with the maize AGPv3 genome, aka B73_RefGen_v3) do not have
a Name attribute in the GFF3 file. Here's a typical line from Zea_mays.AGPv3.24.gff3 (swapping spaces for tabs):
10 ensembl gene 28054052 28054433 . + . ID=gene:GRMZM5G846142;assembly_name=AGPv3;biotype=protein_coding;description=Uncharacterized
protein [Source:UniProtKB/TrEMBL%3BAcc:K7TYI5];logic_name=genebuilder;version=1
A lot of wonderful info there in the attributes, but no Name is provided. This gets loaded into my database (using
bp_seqfeature_load.pl) with load_id=gene:GRMZM5G846142 and no Name. If one searches for "GRMZM5G846142" or "gene:GRMZM5G846142" in
the browser, one gets Not Found. (Oddly, Ensembl does provide a Name attribute for exons; but not genes, transcripts or CDS, at
least in the current version - and by the way, the color-coded GFF3 parent-child relationship display works wonderfully with those!)
Yes, I can futz with the GFF file, or run a SQL script to insert a record in name for all load_id records in attributes that don't
have one in name, and all sorts of things, but I thought maybe there's a way to avoid that in a conf file. Ensembl is a pretty big
source of genome data, so I'd think this would come up a lot.
search to search on the load_id attribute?
Annotated genes that I get from the Ensembl plant database (I'm working with the maize AGPv3 genome, aka B73_RefGen_v3) do not have
a Name attribute in the GFF3 file. Here's a typical line from Zea_mays.AGPv3.24.gff3 (swapping spaces for tabs):
10 ensembl gene 28054052 28054433 . + . ID=gene:GRMZM5G846142;assembly_name=AGPv3;biotype=protein_coding;description=Uncharacterized
protein [Source:UniProtKB/TrEMBL%3BAcc:K7TYI5];logic_name=genebuilder;version=1
A lot of wonderful info there in the attributes, but no Name is provided. This gets loaded into my database (using
bp_seqfeature_load.pl) with load_id=gene:GRMZM5G846142 and no Name. If one searches for "GRMZM5G846142" or "gene:GRMZM5G846142" in
the browser, one gets Not Found. (Oddly, Ensembl does provide a Name attribute for exons; but not genes, transcripts or CDS, at
least in the current version - and by the way, the color-coded GFF3 parent-child relationship display works wonderfully with those!)
Yes, I can futz with the GFF file, or run a SQL script to insert a record in name for all load_id records in attributes that don't
have one in name, and all sorts of things, but I thought maybe there's a way to avoid that in a conf file. Ensembl is a pretty big
source of genome data, so I'd think this would come up a lot.