Bio::Tools
Glimmer
Toolbar
Summary
Bio::Tools::Glimmer - parser for Glimmer 2.X/3.X prokaryotic and
GlimmerM/GlimmerHMM eukaryotic gene predictions
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
use Bio::Tools::Glimmer;
# file
my $parser = Bio::Tools::Glimmer->new(-file => $file);
# filehandle:
$parser = Bio::Tools::Glimmer->new( -fh => \*INPUT );
# provide a sequence identifier (Glimmer 2.X)
my $parser = Bio::Tools::Glimmer->new(-file => $file, -seqname => seqname);
# force format (override automatic detection)
my $parser = Bio::Tools::Glimmer->new(-file => $file, -format => 'GlimmerM');
# parse the results
# note: this class is-a Bio::Tools::AnalysisResult which implements
# Bio::SeqAnalysisParserI, i.e., $glimmer->next_feature() is the same
while(my $gene = $parser->next_prediction()) {
# For eukaryotic input (GlimmerM/GlimmerHMM), $gene will be an instance
# of Bio::Tools::Prediction::Gene, which inherits off
# Bio::SeqFeature::Gene::Transcript, and $gene->exons() will return an
# array of Bio::Tools::Prediction::Exon objects.
# For prokaryotic input (Glimmer2.X/Glimmer3.X), $gene will be an
# instance of Bio::SeqFeature::Generic
# all exons (eukaryotic only):
@exon_arr = $gene->exons();
# initial exons only
@init_exons = $gene->exons('Initial');
# internal exons only
@intrl_exons = $gene->exons('Internal');
# terminal exons only
@term_exons = $gene->exons('Terminal');
}
Description
This is a module for parsing Glimmer, GlimmerM and GlimmerHMM predictions.
It will create gene objects from the prediction report which can
be attached to a sequence using Bioperl objects, or output as GFF
suitable for loading into Bio::DB::GFF for use with Gbrowse.
Glimmer is open source and available at
http://www.cbcb.umd.edu/software/glimmer/.
GlimmerM is open source and available at
http://www.tigr.org/software/glimmerm/.
GlimmerHMM is open source and available at
http://www.cbcb.umd.edu/software/GlimmerHMM/.
Note that Glimmer 2.X will only process the first
sequence in a fasta file, and the prediction report does not contain any
sort of sequence identifier
Note that Glimmer 3.X produces two output files. This module only parses
the .predict file.
Methods
Methods description
Title : new Usage : my $obj = Bio::Tools::Glimmer->new(); Function: Builds a new Bio::Tools::Glimmer object Returns : an instance of Bio::Tools::Glimmer Args : format ('Glimmer', 'GlimmerM', 'GlimmerHMM'), seqname |
Usage : $glimmer->analysis_method(); Purpose : Inherited method. Overridden to ensure that the name matches /glimmer/i. Returns : String Argument : n/a |
Title : next_feature Usage : while($gene = $glimmer->next_feature()) { # do something } Function: Returns the next gene structure prediction of the Glimmer result file. Call this method repeatedly until FALSE is returned.
The returned object is actually a SeqFeatureI implementing object.
This method is required for classes implementing the
SeqAnalysisParserI interface, and is merely an alias for
next_prediction() at present.
Example :
Returns : A Bio::Tools::Prediction::Gene object.
Args : |
Title : next_prediction Usage : while($gene = $glimmer->next_prediction()) { # do something } Function: Returns the next gene structure prediction of the Glimmer result file. Call this method repeatedly until FALSE is returned.
Example :
Returns : A Bio::Tools::Prediction::Gene object.
Args : |
Title : _parse_predictions() Usage : $obj->_parse_predictions() Function: Parses the prediction section. Automatically called by next_prediction() if not yet done. Example : Returns : |
Title : _parse_eukaryotic() Usage : $obj->_parse_eukaryotic() Function: Parses the prediction section. Automatically called by next_prediction() if not yet done. Example : Returns : |
Title : _parse_prokaryotic() Usage : $obj->_parse_prokaryotic() Function: Parses the prediction section. Automatically called by next_prediction() if not yet done. Example : Returns : |
Title : _prediction() Usage : $gene = $obj->_prediction() Function: internal Example : Returns : |
Title : _add_prediction() Usage : $obj->_add_prediction($gene) Function: internal Example : Returns : |
Title : _predictions_parsed Usage : $obj->_predictions_parsed Function: internal Example : Returns : TRUE or FALSE |
Title : _seqname Usage : $obj->_seqname($seqname) Function: internal (for Glimmer 2.X) Example : Returns : String |
Title : _seqlength Usage : $obj->_seqlength($seqlength) Function: internal (for Glimmer 2.X) Example : Returns : String |
Title : _format Usage : $obj->_format($format) Function: internal Example : Returns : String |
Title : _detail_file Usage : $obj->_detail_file($filename) Function: internal (for Glimmer 3.X) Example : Returns : String |
Methods code
| _initialize_state | description | prev | next | Top |
sub _initialize_state
{ my($self,@args) = @_;
my $make = $self->SUPER::_initialize_state(@args);
$self->{'_preds_parsed'} = 0;
$self->{'_preds'} = [];} |
sub new
{ my($class,@args) = @_;
my $self = $class->SUPER::new(@args);
my ($format, $seqname, $seqlength, $detail) =
$self->_rearrange([qw(FORMAT SEQNAME SEQLENGTH DETAIL)], @args);
if (defined($format) &&
(($format eq 'Glimmer') ||
($format eq 'GlimmerM') ||
($format eq 'GlimmerHMM'))
) {
$self->_format($format);
}
if (defined($detail)) {
$self->_format('Glimmer');
$self->_detail_file($detail);
}
$self->_seqname($seqname) if defined($seqname);
$self->_seqlength($seqlength) if defined($seqlength);
return $self;} |
sub analysis_method
{ my ($self, $method) = @_;
if($method && ($method !~ /glimmer/i)) {
$self->throw("method $method not supported in " . ref($self));
}
return $self->SUPER::analysis_method($method); } |
sub next_feature
{ my ($self,@args) = @_;
return $self->next_prediction(@args);} |
sub next_prediction
{ my ($self) = @_;
my $gene;
$self->_parse_predictions() unless $self->_predictions_parsed();
$gene = $self->_prediction();
return $gene;} |
sub _parse_predictions
{
my ($self) = @_;
my %method = (
'Glimmer' => '_parse_prokaryotic',
'GlimmerM' => '_parse_eukaryotic',
'GlimmerHMM' => '_parse_eukaryotic',
'_DEFAULT_' => '_parse_eukaryotic',
);
my $format = $self->_format();
if (!$format) {
while (my $line = $self->_readline()) {
if ( $line =~ /^Glimmer\S*\s+\(Version\s*\S+\)/ ) {
$format = 'GlimmerM';
$self->_pushback($line);
last;
}
elsif ( $line =~ /^Glimmer\S*$/ ) {
$format = 'GlimmerHMM';
$self->_pushback($line);
last;
}
elsif ($line =~ /^Putative Genes:$/) {
$format = 'Glimmer';
$self->_pushback($line);
last;
}
elsif ($line =~ /^>(\S+)/) {
$format = 'Glimmer';
$self->_pushback($line);
last;
}
}
}
my $method =
(exists($method{$format})) ? $method{$format} : $method{'_DEFAULT_'};
return $self->$method();} |
sub _parse_eukaryotic
{ my ($self) = @_;
my ($gene,$seqname,$seqlen,$source,$lastgenenum);
while(defined($_ = $self->_readline())) {
if( /^(Glimmer\S*)\s+\(Version\s*(\S+)\)/ ) {
$source = "$1_$2";
next;
} elsif( /^(GlimmerHMM\S*)$/ ) { $source = $1;
next;
} elsif(/^Sequence name:\s+(.+)$/ ) {
$seqname = $1;
next;
} elsif( /^Sequence length:\s+(\S+)/ ) {
$seqlen = $1;
next;
} elsif( m/^(Predicted genes)|(Gene)|\s+\#/ || /^\s+$/ ) { next;
} elsif( /^\s+(\d+)\s+ # gene num (\d+)\s+ # exon num ([\+\-])\s+ # strand (\S+)\s+ # exon type (\d+)\s+(\d+) # exon start, end \s+(\d+) # exon length /ox ) {
my ($genenum,$exonnum,$strand,$type,$start,$end,$len) =
( $1,$2,$3,$4,$5,$6,$7);
if( ! $lastgenenum || $lastgenenum != $genenum) {
$self->_add_prediction($gene) if ( $gene );
$gene = Bio::Tools::Prediction::Gene->new
(
'-seq_id' => $seqname,
'-primary_tag' => "gene",
'-source_tag' => $source,
'-tag' => { 'Group' => "GenePrediction$genenum"},
);
}
my $exon = Bio::Tools::Prediction::Exon->new
('-seq_id' => $seqname,
'-start' => $start,
'-end' => $end,
'-strand' => $strand eq '-' ? '-1' : '1',
'-source_tag' => $source,
'-primary_tag'=> 'exon',
'-tag' => { 'Group' => "GenePrediction$genenum"},
);
$gene->add_exon($exon,lc($type));
$lastgenenum = $genenum;
}
}
$self->_add_prediction($gene) if( $gene );
$self->_predictions_parsed(1);} |
sub _parse_prokaryotic
{ my ($self) = @_;
my $source = 'Glimmer';
my %seqlength = ( );
my $seqname = $self->_seqname();
my $seqlength = $self->_seqlength();
if (defined($seqlength)) {
$seqlength{$seqname} = $seqlength
}
my $detail_file = $self->_detail_file();
if (defined($detail_file)) {
my $io = Bio::Root::IO->new(-file => $detail_file);
my $seqname;
while (defined($_ = $io->_readline())) {
if ($_ =~ /^>(\S+)/) {
$seqname = $1;
next;
}
if (defined($seqname) && ($_ =~ /^Sequence length = (\d+)$/)) {
$seqlength{$seqname} = $1;
next;
}
}
}
my $location_factory = Bio::Factory::FTLocationFactory->new();
while(defined($_ = $self->_readline())) {
if ($_ =~ /^Putative Genes:$/) {
$source = 'Glimmer_2.X';
next;
}
elsif ($_ =~ /^>(\S+)/) {
$seqname = $1;
$seqlength = $seqlength{$seqname};
$source = 'Glimmer_3.X';
next;
}
elsif (
(/^\s+(\d+)\s+ # gene num (\d+)\s+(\d+)\s+ # start, end \[([\+\-])(\d{1})\s+ # strand, frame /ox ) ||
(/^[^\d]+(\d+)\s+ # orf (numeric portion) (\d+)\s+(\d+)\s+ # start, end ([\+\-])(\d{1})\s+ # strand, frame ([\d\.]+) # score /ox)) {
my ($genenum,$start,$end,$strand,$frame,$score) =
( $1,$2,$3,$4,$5,$6 );
my $circular_prediction = 0;
if ($strand eq '+') {
if ($start > $end) {
$circular_prediction = 1;
}
}
else {
if ($start < $end) {
$circular_prediction = 1;
}
}
if ($circular_prediction) {
unless (defined($seqlength)) {
$self->throw("need to know the sequence length to handle wraparound genes");
}
}
if ($source eq 'Glimmer_2.X') {
if ($strand eq '+') {
$end += 3;
}
else {
$end -= 3;
}
}
my ($fst, $fend);
foreach my $coord ($start, $end) {
if ($coord < 1) {
$coord = '<1';
$fst++;
} elsif (defined($seqlength) && ($coord > $seqlength)) {
$coord = ">$seqlength";
$fend++;
}
}
my $location_string;
if ($circular_prediction) {
if ($strand eq '+') {
$location_string = "join($start..$seqlength,1..$end)";
}
else {
$location_string = "join($start..1,$seqlength..$end)";
}
}
else {
if ($strand eq '-' && !$fst && !$fend && $start > $end) {
($start, $end) = ($end, $start);
}
$location_string = "$start..$end";
}
my $location_object =
$location_factory->from_string($location_string);
$frame--;
my $gene = Bio::SeqFeature::Generic->new
(
'-seq_id' => $seqname,
'-location' => $location_object,
'-strand' => $strand eq '-' ? '-1' : '1',
'-frame' => $frame,
'-source_tag' => $source,
'-display_name' => "orf$genenum",
'-primary_tag'=> 'gene',
'-tag' => { 'Group' => "GenePrediction_$genenum"},
'-score' => $score || undef
);
$self->_add_prediction($gene)
}
}
$self->_predictions_parsed(1);} |
sub _prediction
{ my ($self) = @_;
return unless(exists($self->{'_preds'}) && @{$self->{'_preds'}});
return shift(@{$self->{'_preds'}});} |
sub _add_prediction
{ my ($self, $gene) = @_;
if(! exists($self->{'_preds'})) {
$self->{'_preds'} = [];
}
push(@{$self->{'_preds'}}, $gene);} |
sub _predictions_parsed
{ my ($self, $val) = @_;
$self->{'_preds_parsed'} = $val if $val;
if(! exists($self->{'_preds_parsed'})) {
$self->{'_preds_parsed'} = 0;
}
return $self->{'_preds_parsed'};} |
sub _seqname
{ my ($self, $val) = @_;
$self->{'_seqname'} = $val if $val;
if(! exists($self->{'_seqname'})) {
$self->{'_seqname'} = 'unknown';
}
return $self->{'_seqname'};} |
sub _seqlength
{ my ($self, $val) = @_;
$self->{'_seqlength'} = $val if $val;
return $self->{'_seqlength'};} |
sub _format
{ my ($self, $val) = @_;
$self->{'_format'} = $val if $val;
return $self->{'_format'};} |
sub _detail_file
{ my ($self, $val) = @_;
$self->{'_detail_file'} = $val if $val;
return $self->{'_detail_file'};
}
1;} |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Please direct usage questions or support issues to the mailing list:
bioperl-l@bioperl.org
rather than to the module maintainer directly. Many experienced and
reponsive experts will be able look at the problem and quickly
address it. Please include a thorough description of the problem
with code and data examples if at all possible.
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via
email or the web:
https://redmine.open-bio.org/projects/bioperl/
| AUTHOR - Jason Stajich | Top |
Email jason-at-bioperl-dot-org
Torsten Seemann
Mark Johnson
The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _