Bio::Tools Geneid
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Bio::Tools::Geneid - Results of one geneid run
Package variables
No package variables defined.
Included modules
Bio::Root::IO Bio::Root::Root
  use Bio::Tools::Geneid;
my $gid = Bio::Tools::Geneid(-file => "geneid.out");
while (my $gene = $gid->next_prediction) { my @transcripts = $gene->transcripts; foreach my $t (@transcripts) { my @exons = $t->exons; foreach my $e (@exons) { printf("Exon %d..%d\n", $e->start, $e->end); } } }
This is the parser for the output of geneid by Enrique Blanco and
Roderic Guigó (IMIM-UPF). See It
relies on native geneid output format internally and will work with
geneid versions 1.0 and 1.1. Currently this module supports only the
default mode of operation which is to predict exons and assemble an
optimal gene prediction.
It takes either a file handle or a file name and returns a
Bio::SeqFeature::Gene::GeneStructure object.
Methods description
newcode    nextTop
 Title   : new
Usage : $obj->new(-file = "<geneid.out");
$obj->new(-fh => \*GI);
Function: Constructor for geneid wrapper. Takes either a file
: or filehandle
Returns : Bio::Tools::Geneid
 Title   : next_prediction
Usage : while($gene = $geneid->next_prediction)
# do something
Function: Returns the gene structure prediction of the geneid result
file. Call this method repeatedly until FALSE is returned.
Returns : A Bio::SeqFeature::Gene::GeneStructure object
Args : None
 Title   : _add_exon
Usage : $obj->_add_exon($gene, $transcript, ... exon data ...)
Function: Adds a new exon to both gene and transcript from the data
: supplied as args
Example :
Returns : Nothing
 Title   : _set_strand
Usage : $obj->_set_strand($gene)
Function: Sets the overall gene strand to the same strand as all
: the exons if they are all on the same strand, or to strand 0
: if the exons are on different strands.
Example :
Returns : Nothing
 Title   : _target_id
Usage : $obj->_target_id
Function: get/set for genomic sequence id
Example :
Returns : A target ID
Methods code
sub new {
    my($class, @args) = @_;
    my $self = $class->SUPER::new(@args);
    return $self;
sub next_prediction {
    my ($self) = @_;

    my ($gene, $transcript, $current_gene_id);
    my $transcript_score = 0;

    my ($gene_id, $exon_type, $exon_start, $exon_end, $exon_score,
        $exon_strand, $start_phase, $end_phase, $start_sig_score,
        $end_sig_score, $coding_pot_score, $homol_score);

    while (defined($_ = $self->_readline))


        # We have a choice of geneid, gff or XML formats. The native
# geneid format has more information than gff. However, we
# then need to perform the hack of extracting the sequence ID
# from the header of the embedded Fasta file which comes after
# the exon data, as it is not stored elsewhere. Ack.
# the new version of geneID includes the sequence ID in a slightly
# different format and a new "or" statement was added below
# also removed "unless defined $self->_target_id;" inorder to continue
# generating new sequence IDs.
if (/^>(\S+)\|GeneId/ or /^# Sequence (\S+)/) { my $target_id = $1; $self->_target_id($target_id); next; } next unless (/(Single|First|Internal|Terminal)/); my @fields = split(/\s+/, $_); # Grab gene_id from eol first as there are issues with
# inconsistent whitespace in the AA coords field
$gene_id = pop @fields; ($exon_type, $exon_start, $exon_end, $exon_score, $exon_strand, $start_phase, $end_phase, $start_sig_score, $end_sig_score, $coding_pot_score, $homol_score) = @fields[0..10]; if (! defined $current_gene_id) { # Starting the requested prediction
$current_gene_id = $gene_id; $transcript_score = $exon_score; $gene = Bio::SeqFeature::Gene::GeneStructure->new(-source => $SOURCE_TAG); $transcript = Bio::SeqFeature::Gene::Transcript->new(-source => $SOURCE_TAG); $self->_add_exon($gene, $transcript, $exon_type, $exon_start, $exon_end, $exon_score, $exon_strand, $start_phase, $end_phase, $start_sig_score, $end_sig_score, $coding_pot_score, $homol_score); } elsif ($gene_id eq $current_gene_id) { # Still in requested prediction
$transcript_score += $exon_score; $self->_add_exon($gene, $transcript, $exon_type, $exon_start, $exon_end, $exon_score, $exon_strand, $start_phase, $end_phase, $start_sig_score, $end_sig_score, $coding_pot_score, $homol_score); } else { # Found following prediction
$self->_pushback($_); last; } } if (defined $gene) { $transcript->seq_id($self->_target_id); $transcript->score($transcript_score); $gene->add_transcript($transcript); $gene->seq_id($self->_target_id); foreach my $exon ($gene->exons) { $exon->seq_id($self->_target_id); } $self->_set_strand($gene); } return $gene;
sub _add_exon {
    my ($self, $gene, $transcript, $exon_type, $exon_start, $exon_end,
        $exon_score, $exon_strand, $start_phase, $end_phase, $start_sig_score,
        $end_sig_score, $coding_pot_score, $homol_score) = @_;

    $exon_type =~ s/First/Initial/;

    my $strand = $exon_strand eq '+' ? 1 : -1;

    my $exon = Bio::SeqFeature::Gene::Exon->new(-source => $SOURCE_TAG,
                                                -start  => $exon_start,
                                                -end    => $exon_end,
                                                -strand => $strand,
                                                -score  => $exon_score);
    $exon->add_tag_value("Type", $exon_type);
    $exon->add_tag_value('phase', $start_phase);
    $exon->add_tag_value('end_phase', $end_phase);
    $exon->add_tag_value('start_signal_score', $start_sig_score);
    $exon->add_tag_value('end_signal_score', $end_sig_score);
    $exon->add_tag_value('coding_potential_score', $coding_pot_score);
    $exon->add_tag_value('homology_score', $homol_score);

    $transcript->strand($strand) unless $transcript->strand != 0;
    $transcript->add_exon($exon, $exon_type);
sub _set_strand {
    my ($self, $gene) = @_;

    my $fwd = 0;
    my $rev = 0;

    my @exons = $gene->exons;
    foreach my $exon (@exons)
        my $strand = $exon->strand;

        if ($strand == 1)
        elsif ($strand == -1)

    if ($#exons == $fwd)
    elsif ($#exons == $rev)

    return $gene;
sub _target_id {
    my ($self,$val) = @_;
    if ($val)
        $self->{'_target_id'} = $val;

    return $self->{'_target_id'};

General documentation
Mailing ListsTop
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.                  - General discussion - About the mailing lists
Support Top
Please direct usage questions or support issues to the mailing list:
rather than to the module maintainer directly. Many experienced and
reponsive experts will be able look at the problem and quickly
address it. Please include a thorough description of the problem
with code and data examples if at all possible.
Reporting BugsTop
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via the
AUTHOR - Keith JamesTop
The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _