Bio::AlignIO stockholm
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Summary
Bio::AlignIO::stockholm - stockholm sequence input/output stream
Package variables
Privates (from "my" definitions)
$factory(2) = Bio::Annotation::AnnotationFactory->new( -type => "Bio::Annotation::SimpleValue")
$factory(1) = Bio::Annotation::AnnotationFactory->new( -type => "Bio::Annotation::$atype")
$atype = ($tag eq 'reference') ? 'Reference' : ($tag eq 'aln_dblink') ? 'DBLink' : ($tag eq 'build_command') ? 'SimpleValue' : 'BadValue'
$ann = $factory->create_object(%clean_data)
%clean_data = map { $data->{$_} =~ s{\s+$}{}g
$ameta = Bio::Seq::Meta->new()
$factory(3) = Bio::Annotation::AnnotationFactory->new( -type => "Bio::Annotation::$atype")
$coll = Bio::Annotation::Collection->new()
($atype, $aparam, $tagname) = split q(/), $tag
Included modules
Bio::Annotation::AnnotationFactory
Bio::Seq::Meta
Data::Dumper
Text::Wrap qw ( wrap )
Inherit
Bio::AlignIO
Synopsis
  # Do not use this module directly.  Use it via the Bio::AlignIO class.
use Bio::AlignIO; use strict; my $in = Bio::AlignIO->new(-format => 'stockholm', -file => 't/data/testaln.stockholm'); while( my $aln = $in->next_aln ) { }
Description
This object can transform Bio::Align::AlignI objects to and from
stockholm flat file databases. This has been completely refactored
from the original stockholm parser to handle annotation data and now
includes a write_aln() method for (almost) complete stockholm
format output.
Stockholm alignment records normally contain additional sequence-based
and alignment-based annotation
  GF Lines (alignment feature/annotation):
#=GF <featurename> <Generic per-file annotation, free text>
Placed above the alignment

GC Lines (Alignment consensus)
#=GC <featurename> <Generic per-column annotation, exactly 1
character per column>
Placed below the alignment
GS Lines (Sequence annotations) #=GS GR Lines (Sequence meta data) #=GR
Currently, sequence annotations (those designated with GS tags) are
parsed only for accession numbers and descriptions. It is intended that
full parsing will be added at some point in the near future along with
a builder option for optionally parsing alignment annotation and meta data.
The following methods/tags are currently used for storing and writing
the alignment annotation data.
    Tag        SimpleAlign
Method
----------------------------------------------------------------------
AC accession
ID id
DE description
----------------------------------------------------------------------

Tag Bio::Annotation TagName Parameters
Class
----------------------------------------------------------------------
AU SimpleValue record_authors value
SE SimpleValue seed_source value
GA SimpleValue gathering_threshold value
NC SimpleValue noise_cutoff value
TC SimpleValue trusted_cutoff value
TP SimpleValue entry_type value
SQ SimpleValue num_sequences value
PI SimpleValue previous_ids value
DC Comment database_comment comment
CC Comment alignment_comment comment
DR DBLink aln_dblink database
primary_id
comment
AM SimpleValue build_method value
NE SimpleValue pfam_family_accession value
NL SimpleValue sequence_start_stop value
SS SimpleValue sec_structure_source value
BM SimpleValue build_model value
RN Reference reference *
RC Reference reference comment
RM Reference reference pubmed
RT Reference reference title
RA Reference reference authors
RL Reference reference location
----------------------------------------------------------------------
* RN is generated based on the number of Bio::Annotation::Reference objects
Methods
_initialize
No description
Code
next_alnDescriptionCode
write_alnDescriptionCode
Methods description
next_alncode    nextTop
 Title   : next_aln
Usage : $aln = $stream->next_aln()
Function: returns the next alignment in the stream.
Returns : Bio::Align::AlignI object
Args : NONE
write_alncodeprevnextTop
 Title   : write_aln
Usage : $stream->write_aln(@aln)
Function: writes the $aln object into the stream in stockholm format
Returns : 1 for success and 0 for error
Args : Bio::Align::AlignI object
Methods code
_initializedescriptionprevnextTop
sub _initialize {
    my ( $self, @args ) = @_;
    $self->SUPER::_initialize(@args);
    # add arguments to handle build object, interleaved format
}
next_alndescriptionprevnextTop
sub next_aln {
    my $self = shift;
    my $line;

    my ($start, $end, $id, $name, $seqname, $seq, $count, $tag, $data);
    my $seen_rc;
    my ($refct, $bct, $lnkct) = (0,0,0);
    my @c2name;
    my (%align, %accession, %desc, %seq_meta, %aln_meta, %annotation);

    # in stockholm format, every non-blank line that does not start
# with '#=' is an alignment segment; the '#=' lines are mark up lines.
# Of particular interest are the '#=GF <name/st-ed> AC <accession>'
# lines, which give accession numbers for each segment
my $aln = Bio::SimpleAlign->new(-source => 'stockholm'); while( defined($line = $self->_readline) ) { next unless $line =~ /\w+/; if ($line =~ /^#\s*STOCKHOLM\s+/) { last; } else { $self->throw("Not Stockholm format: Expecting\" # STOCKHOLM 1.0\"; Found\" $_\""); } } READLINE: while( defined($line = $self->_readline) ) { #skip empty lines
next if $line =~ /^\s+$/; # Double slash (//) signals end of file.
last if $line =~ m{^//}; # GF/GS lines, by convention, should be at the top of the alignment
if ($line =~ m{^\#=GF\s+(\S+?)\s+([^\n]*)$}xms) { # alignment annotation
($tag, $data) = ($1, $2); if (exists $READMAP{$tag}) { # reference data (multi line)
if (index($tag, 'R') == 0) { # comments come before numbering, tricky
$refct++ if ( ($tag eq 'RN' && !$seen_rc) || $tag eq 'RC'); $seen_rc = 1 if $tag eq 'RC'; # Don't need
next READLINE if $tag eq 'RN'; # # of ref parameter
$annotation{ 'reference' }->[$refct-1]->{ $READMAP{$tag} } .= $data.' '; # Build commands (single line)
} elsif ($tag eq 'BM') { # # build cmd parameter
$annotation{ 'build_command' }->[$bct]->{ $READMAP{$tag} } = $data; $bct++; # DBLinks (single line)
} elsif ($tag eq 'DR') { my ($dbase, $uid, $extra) = split /\s*;\s*/ , $data, 3; my $ref; $ref->{'-database'} = $dbase; $ref->{'-primary_id'} = ($dbase eq 'URL') ? $uid : uc $uid; $ref->{'-comment'} = $extra if $extra; # # dblink parameter list
$annotation{ 'aln_dblink' }->[$lnkct] = $ref; $lnkct++; # Everything else (single and multi line)
} else { # # param/-value/tagname
$annotation{ $READMAP{$tag} } .= $data.' '; } } else { # unknown or custom data treated with simplevalue objects
#$self->debug("Unknown tag: $tag:\t$data\n");
$annotation{ 'custom' }->{ $tag } .= $data.' '; } } elsif( $line =~ m{^\#=GS\s+(\S+)\s+(\w{2})\s+(\S+)}xms ) { # sequence annotation and data
($id, $tag, $data) = ($1, $2, $3); if ($tag eq 'AC') { $accession{$id} .= $data; } elsif ($tag eq 'DE') { $desc{$id} .= $data; } # Bio::Seq::Meta is not AnnotationI, so can't add seq-based
# Annotations yet; uncomment to see what is passed by
#else {
# $self->debug("Missed data: $entry");
#}
} elsif( $line =~ m{^\#=GR\s+(\S+)\s+(\S+)\s+([^\n]+)} ) { # meta strings per sequence
($name, $tag, $data) = ($1, $2, $3); $seq_meta{$name}->{$tag} .= $data; } elsif( $line =~ m{^\#=GC\s+(\S+)\s+([^\n]+)}xms ) { # meta strings per alignment
($tag, $data) = ($1, $2); $aln_meta{$tag} .= $data; } elsif( $line =~ m{^([^\#]\S+)\s+([A-Za-z.\-\*]+)\s*}xms ) { ($name,$seq) = ($1,$2); if( ! exists $align{$name} ) { push @c2name, $name; } $align{$name} .= $seq; } else { # debugging to catch missed data; uncomment to turn on
#$self->debug("Missed Data: $line");
}
}
write_alndescriptionprevnextTop
sub write_aln {
    # enable array of SimpleAlign objects as well (see clustalw write_aln())
my ($self, @aln) = @_; for my $aln (@aln) { $self->throw('Need Bio::Align::AlignI object') if (!$aln || !($aln->isa('Bio::Align::AlignI'))); my @anns; my $coll = $aln->annotation; my ($aln_ann, $seq_ann, $aln_meta, $seq_meta) = ('#=GF ', '#=GS ', '#=GC ', '#=GR' ); $self->_print("# $STKVERSION\n\n") or return 0; # annotations first
for my $param (@WRITEORDER) { # no point in going through this if there is no annotation!
last if !$coll; # alignment annotations
my $ct = 1; $self->throw("Bad parameter: $param") if !exists $WRITEMAP{$param}; # get the data, act on it based on the tag
my ($tag, $key) = split q(/), $WRITEMAP{$param}; if ($key eq 'Method') { push @anns, $aln->$param; } else { @anns = $coll->get_Annotations($param); } my $rn = 1; ANNOTATIONS: while (my $ann = shift @anns) { # using Text::Wrap::wrap() for word wrap
my ($text, $alntag, $data); if ($tag eq 'RX') { REFS: for my $rkey (qw(ref_comment ref_number ref_pubmed ref_title ref_authors ref_location)) { my ($newtag, $method) = split q(/), $WRITEMAP{$rkey}; $alntag = sprintf('%-10s',$aln_ann.$newtag); if ($rkey eq 'ref_number') { $data = "[$rn]"; } else { $data = $ann->$method; } next REFS unless $data; $text = wrap($alntag, $alntag, $data); $self->_print("$text\n") or return 0; } $rn++; next ANNOTATIONS; } elsif ($tag eq 'XX') { # custom
my $newtag = $ann->tagname; $alntag = sprintf('%-10s',$aln_ann.$newtag); $data = $ann; } elsif ($tag eq 'SQ') { # use the actual number, not the stored Annotation data
$alntag = sprintf('%-10s',$aln_ann.$tag); $data = $aln->no_sequences; } else { $alntag = sprintf('%-10s',$aln_ann.$tag); $data = $ann; } $text = wrap($alntag, $alntag, $data); $self->_print("$text\n") or return 0; } } $self->_print("\n"); # now the sequences...
# modified (significantly) from AlignIO::pfam
my ($namestr,$seq,$add); # pad extra for meta lines
my $maxlen = $aln->maxdisplayname_length() + 5; my $metalen = $aln->max_metaname_length() || 0; for $seq ( $aln->each_seq() ) { $namestr = $aln->displayname($seq->get_nse()); $self->_print(sprintf("%-*s %s\n",$maxlen+$metalen, $namestr, $seq->seq())) or return 0; if ($seq->isa('Bio::Seq::MetaI')) { for my $mname ($seq->meta_names) { $self->_print(sprintf("%-*s%*s %s\n",$maxlen, $seq_meta.' '.$namestr, $metalen, $mname, $seq->named_meta($mname))) or return 0; } } } # alignment consensus
my $ameta = $aln->consensus_meta; if ($ameta) { for my $mname ($ameta->meta_names) { $self->_print(sprintf("%-*s%*s %s\n",$maxlen, $aln_meta, $metalen, $mname, $ameta->named_meta($mname))) or return 0; } } $self->_print("//\n") or return 0; } $self->flush() if $self->_flush_on_write && defined $self->_fh; return 1;
}
General documentation
FEEDBACKTop
Reporting BugsTop
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via the
web:
  http://bugzilla.open-bio.org/
AUTHORS - Chris Fields, Peter SchattnerTop
Email: cjfields-at-uiuc-dot-edu, schattner@alum.mit.edu
CONTRIBUTORSTop
Andreas Kahari, ak-at-ebi.ac.uk
Jason Stajich, jason-at-bioperl.org
APPENDIXTop
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _