Bio::SeqIO
genbank
Summary
Bio::SeqIO::GenBank - GenBank sequence input/output stream
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
It is probably best not to use this object directly, but
rather go through the SeqIO handler system. Go:
$stream = Bio::SeqIO->new(-file => $filename, -format => 'GenBank');
while ( my $seq = $stream->next_seq() ) {
# do something with $seq
}
Description
This object can transform Bio::Seq objects to and from GenBank flat
file databases.
There is alot of flexibility here about how to dump things which I need
to document fully.
This section is supposed to document which sections and properties of
a GenBank databank record end up where in the Bioperl object model. It
is far from complete and presently focuses only on those mappings
which may be non-obvious. $seq in the text refers to the
Bio::Seq::RichSeqI implementing object returned by the parser for each
record.
GI number
$seq->primary_id
_show_dna()
(output only) shows the dna or not
_post_sort()
(output only) provides a sorting func which is applied to the FTHelpers
before printing
_id_generation_func()
This is function which is called as
print "ID ", $func($seq), "\n";
To generate the ID line. If it is not there, it generates a sensible ID
line using a number of tools.
If you want to output annotations in genbank format they need to be
stored in a Bio::Annotation::Collection object which is accessible
through the Bio::SeqI interface method
annotation().
The following are the names of the keys which are polled from a
Bio::Annotation::Collection object.
reference - Should contain Bio::Annotation::Reference objects
comment - Should contain Bio::Annotation::Comment objects
Methods
Methods description
Title : next_seq
Usage : $seq = $stream->next_seq()
Function: returns the next sequence in the stream
Returns : Bio::Seq object
Args : |
Title : write_seq
Usage : $stream->write_seq($seq)
Function: writes the $seq object (must be seq) to the stream
Returns : 1 for success and 0 for error
Args : array of 1 to n Bio::SeqI objects |
Title : _print_GenBank_FTHelper
Usage :
Function:
Example :
Returns :
Args : |
Title : _read_GenBank_References
Usage :
Function: Reads references from GenBank format. Internal function really
Example :
Returns :
Args : |
Title : _read_GenBank_Species
Usage :
Function: Reads the GenBank Organism species and classification
lines.
Example :
Returns : A Bio::Species object
Args : a reference to the current line buffer |
Title : _read_FTHelper_GenBank
Usage : _read_FTHelper_GenBank($buffer)
Function: reads the next FT key line
Example :
Returns : Bio::SeqIO::FTHelper object
Args : filehandle and reference to a scalar |
Title : _write_line_GenBank
Usage :
Function: internal function
Example :
Returns :
Args : |
Title : _write_line_GenBank_regex
Usage :
Function: internal function for writing lines of specified
length, with different first and the next line
left hand headers and split at specific points in the
text
Example :
Returns : nothing
Args : file handle, first header, second header, text-line, regex for line breaks, total line length |
Title : _post_sort
Usage : $obj->_post_sort($newval)
Function:
Returns : value of _post_sort
Args : newvalue (optional) |
Title : _show_dna
Usage : $obj->_show_dna($newval)
Function:
Returns : value of _show_dna
Args : newvalue (optional) |
Title : _id_generation_func
Usage : $obj->_id_generation_func($newval)
Function:
Returns : value of _id_generation_func
Args : newvalue (optional) |
Title : _ac_generation_func
Usage : $obj->_ac_generation_func($newval)
Function:
Returns : value of _ac_generation_func
Args : newvalue (optional) |
Title : _sv_generation_func
Usage : $obj->_sv_generation_func($newval)
Function:
Returns : value of _sv_generation_func
Args : newvalue (optional) |
Title : _kw_generation_func
Usage : $obj->_kw_generation_func($newval)
Function:
Returns : value of _kw_generation_func
Args : newvalue (optional) |
Methods code
sub _initialize
{ my($self,@args) = @_;
$self->SUPER::_initialize(@args);
$self->{'_func_ftunit_hash'} = {};
$self->_show_dna(1); if( ! defined $self->sequence_factory ) {
$self->sequence_factory(new Bio::Seq::SeqFactory
(-verbose => $self->verbose(),
-type => 'Bio::Seq::RichSeq'));
}} |
sub next_seq
{ my ($self,@args) = @_;
my $builder = $self->sequence_builder();
my $seq;
my %params;
RECORDSTART: while (1) {
my $buffer;
my (@acc, @features);
my ($display_id, $annotation);
my $species;
@features = ();
$annotation = undef;
@acc = ();
$species = undef;
%params = (-verbose => $self->verbose); local($/) = "\n";
while(defined($buffer = $self->_readline())) {
last if index($buffer,'LOCUS ') == 0;
}
return undef if( !defined $buffer ); $buffer =~ /^LOCUS\s+(\S.*)$/ ||
$self->throw("GenBank stream with bad LOCUS line. Not GenBank in my book. Got '$buffer'");
my @tokens = split(' ', $1);
$display_id = shift(@tokens);
$params{'-display_id'} = $display_id;
$params{'-length'} = shift(@tokens);
$params{'-alphabet'} = (lc(shift @tokens) eq 'bp') ? 'dna' : 'protein';
if (($params{'-alphabet'} eq 'dna') || (@tokens > 2)) {
$params{'-molecule'} = shift(@tokens);
my $circ = shift(@tokens);
if ($circ eq 'circular') {
$params{'-is_circular'} = 1;
$params{'-division'} = shift(@tokens);
} else {
$params{'-division'} =
(CORE::length($circ) == 3 ) ? $circ : shift(@tokens);
}
} else {
$params{'-molecule'} = 'PRT' if($params{'-alphabet'} eq 'aa');
$params{'-division'} = shift(@tokens);
}
my $date = join(' ', @tokens); if($date =~ s/.*(\d\d-\w\w\w-\d\d\d\d).*/$1/) {
$params{'-date'} = [$date];
}
$builder->add_slot_value(%params);
%params = ();
if(! $builder->want_object()) {
$builder->make_object();
next RECORDSTART;
}
if($builder->want_slot('annotation')) {
$annotation = new Bio::Annotation::Collection;
}
$buffer = $self->_readline();
until( !defined ($buffer) ) {
$_ = $buffer;
if (/^DEFINITION\s+(\S.*\S)/) {
my @desc = ($1);
while ( defined($_ = $self->_readline) ) {
/^\s+(.*)/ && do { push (@desc, $1); next;};
last;
}
$builder->add_slot_value(-desc => join(' ', @desc));
}
if( /^ACCESSION\s+(\S.*\S)/ ) {
push(@acc, split(' ',$1));
}
elsif( /^PID\s+(\S+)/ ) {
$params{'-pid'} = $1;
}
elsif( /^VERSION\s+(.+)$/ ) {
my ($acc,$gi) = split(' ',$1);
if($acc =~ /^\w+\.(\d+)/) {
$params{'-version'} = $1;
$params{'-seq_version'} = $1;
}
if($gi && (index($gi,"GI:") == 0)) {
$params{'-primary_id'} = substr($gi,3);
}
}
elsif( /^KEYWORDS\s+(.*)/ ) {
my $keywords = $1;
$keywords =~ s/\;//g;
$keywords =~ s/\.$//; $params{'-keywords'} = $keywords;
}
elsif (/^SOURCE/) {
if($builder->want_slot('species')) {
$species = $self->_read_GenBank_Species(\$buffer);
$builder->add_slot_value(-species => $species);
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
elsif (/^REFERENCE/) {
if($annotation) {
my @refs = $self->_read_GenBank_References(\$buffer);
foreach my $ref ( @refs ) {
$annotation->add_Annotation('reference',$ref);
}
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
elsif (/^COMMENT\s+(.*)/) {
if($annotation) {
my $comment = $1;
while (defined($_ = $self->_readline)) {
last if (/^\S/);
$comment .= $_;
}
$comment =~ s/\n/ /g;
$comment =~ s/ +/ /g;
$annotation->add_Annotation(
'comment',
Bio::Annotation::Comment->new(-text => $comment));
$buffer = $_;
} else {
while(defined($buffer = $self->_readline())) {
last if substr($buffer,0,1) ne ' ';
}
}
next;
}
last if( /^(FEATURES)|(ORIGIN)/ );
$buffer = $self->_readline;
}
return undef if(! defined($buffer));
$builder->add_slot_value(-accession_number => shift(@acc),
-secondary_accessions =>\@ acc,
%params);
$builder->add_slot_value(-annotation => $annotation) if $annotation;
%params = ();
if(! $builder->want_object()) {
$builder->make_object();
next RECORDSTART;
}
if($builder->want_slot('features') && defined($_) && /^FEATURES/) {
$buffer = $self->_readline;
while( defined($buffer) ) {
last if(($buffer =~ /^BASE/) || ($buffer =~ /^ORIGIN/) ||
($buffer =~ /^CONTIG/) );
my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
if( !defined $ftunit ) {
$self->warn("Unexpected error in feature table for ".$params{'-display_id'}." Skipping feature, attempting to recover");
unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
$buffer = $self->_readline();
}
next; }
my $feat =
$ftunit->_generic_seqfeature($self->location_factory(),
$display_id);
if($species && ($feat->primary_tag eq 'source') &&
$feat->has_tag('db_xref') && (! $species->ncbi_taxid())) {
foreach my $tagval ($feat->get_tag_values('db_xref')) {
if(index($tagval,"taxon:") == 0) {
$species->ncbi_taxid(substr($tagval,6));
}
}
}
push(@features, $feat);
}
$builder->add_slot_value(-features =>\@ features);
$_ = $buffer;
}
if( defined ($_) ) {
if( /^CONTIG/ && $builder->want_slot('features')) {
$b = " $_"; my $ftunit = $self->_read_FTHelper_GenBank(\$b);
if( ! defined $ftunit ) {
$self->warn("unable to parse the CONTIG feature\n");
} else {
push(@features,
$ftunit->_generic_seqfeature($self->location_factory(),
$display_id));
}
} elsif(! /^ORIGIN/) { while (defined( $_ = $self->_readline) ) {
last if /^ORIGIN/;
}
}
}
if(! $builder->want_object()) {
$builder->make_object(); next RECORDSTART;
}
if($builder->want_slot('seq')) {
my $seqc = '';
while( defined($_ = $self->_readline) ) {
/^\/\// && last;
$_ = uc($_);
s/[^A-Za-z]//g;
$seqc .= $_;
}
$self->debug("sequence length is ". length($seqc) ."\n");
$builder->add_slot_value(-seq => $seqc);
} else {
while( defined($_ = $self->_readline) ) {
last if substr($_,0,2) eq '//';
}
}
$seq = $builder->make_object();
next RECORDSTART unless $seq;
last RECORDSTART;
}
return $seq;} |
sub write_seq
{ my ($self,@seqs) = @_;
foreach my $seq ( @seqs ) {
$self->throw("Attempting to write with no seq!") unless defined $seq;
if( ! ref $seq || ! $seq->isa('Bio::SeqI') ) {
$self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
}
my $i;
my $str = $seq->seq;
my ($div, $mol);
my $len = $seq->length();
if ( $seq->can('division') ) {
$div=$seq->division;
}
if( !defined $div || ! $div ) { $div = 'UNK'; }
if( !$seq->can('molecule') || ! defined ($mol = $seq->molecule()) ) {
$mol = $seq->alphabet || 'DNA';
}
my $circular = 'linear ';
$circular = 'circular' if $seq->is_circular;
local($^W) = 0;
my $temp_line;
if( $self->_id_generation_func ) {
$temp_line = &{$self->_id_generation_func}($seq);
} else {
my $date = '';
if( $seq->can('get_dates') ) {
($date) = $seq->get_dates();
}
$temp_line = sprintf ("%-12s%-15s%13s %s%4s%-8s%-8s %3s %-s",
'LOCUS', $seq->id(),$len,
(lc($mol) eq 'protein') ? ('aa','', '') :
('bp', '',$mol),$circular,
$div,$date);
}
$self->_print("$temp_line\n");
$self->_write_line_GenBank_regex("DEFINITION ", " ",
$seq->desc(),"\\s\+\|\$",80);
if( $self->_ac_generation_func ) {
$temp_line = &{$self->_ac_generation_func}($seq);
$self->_print("ACCESSION $temp_line\n");
} else {
my @acc = ();
push(@acc, $seq->accession_number());
if( $seq->isa('Bio::Seq::RichSeqI') ) {
push(@acc, $seq->get_secondary_accessions());
}
$self->_print("ACCESSION ", join(" ", @acc), "\n");
}
if($seq->isa('Bio::Seq::RichSeqI') && $seq->pid()) {
$self->_print("PID ", $seq->pid(), "\n");
}
if( defined $self->_sv_generation_func() ) {
$temp_line = &{$self->_sv_generation_func}($seq);
if( $temp_line ) {
$self->_print("VERSION $temp_line\n");
}
} else {
if($seq->isa('Bio::Seq::RichSeqI') && defined($seq->seq_version)) {
my $id = $seq->primary_id(); $self->_print("VERSION ",
$seq->accession_number(), ".", $seq->seq_version,
($id && ($id =~ /^\d+$/) ? " GI:".$id : ""),
"\n");
}
}
if( defined $self->_kw_generation_func() ) {
$temp_line = &{$self->_kw_generation_func}($seq);
$self->_print("KEYWORDS $temp_line\n");
} else {
if( $seq->can('keywords') ) {
$self->_print("KEYWORDS ",$seq->keywords,"\n");
}
}
if (my $spec = $seq->species) {
my ($species, $genus, @class) = $spec->classification();
my $OS;
if( $spec->common_name ) {
$OS = $spec->common_name;
} else {
$OS = "$genus $species";
}
if (my $ssp = $spec->sub_species) {
$OS .= " $ssp";
}
$self->_print("SOURCE $OS.\n");
$self->_print(" ORGANISM ",
($spec->organelle() ? $spec->organelle()." " : ""),
"$genus $species", "\n");
my $OC = join('; ', (reverse(@class), $genus)) .'.';
$self->_write_line_GenBank_regex(' 'x12,' 'x12,
$OC,"\\s\+\|\$",80);
}
my $count = 1;
foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
$temp_line = sprintf ("REFERENCE $count (%s %d to %d)",
($seq->alphabet() eq "protein" ?
"residues" : "bases"),
$ref->start,$ref->end);
$self->_print("$temp_line\n");
$self->_write_line_GenBank_regex(" AUTHORS ",' 'x12,
$ref->authors,"\\s\+\|\$",80);
$self->_write_line_GenBank_regex(" TITLE "," "x12,
$ref->title,"\\s\+\|\$",80);
$self->_write_line_GenBank_regex(" JOURNAL "," "x12,
$ref->location,"\\s\+\|\$",80);
if ($ref->comment) {
$self->_write_line_GenBank_regex(" REMARK "," "x12,
$ref->comment,"\\s\+\|\$",80);
}
if( $ref->medline) {
$self->_write_line_GenBank_regex(" MEDLINE "," "x12,
$ref->medline, "\\s\+\|\$",80);
if( $ref->pubmed ) {
$self->_write_line_GenBank_regex(" PUBMED "," "x12,
$ref->pubmed, "\\s\+\|\$",
80);
}
}
$count++;
}
foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
$self->_write_line_GenBank_regex("COMMENT "," "x12,
$comment->text,"\\s\+\|\$",80);
}
$self->_print("FEATURES Location/Qualifiers\n");
my $contig;
if( defined $self->_post_sort ) {
my $post_sort_func = $self->_post_sort();
my @fth;
foreach my $sf ( $seq->top_SeqFeatures ) {
push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));
}
@fth = sort { &$post_sort_func($a,$b) } @fth;
foreach my $fth ( @fth ) {
$self->_print_GenBank_FTHelper($fth);
}
} else {
foreach my $sf ( $seq->top_SeqFeatures ) {
my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);
foreach my $fth ( @fth ) {
if( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
$sf->throw("Cannot process FTHelper... $fth");
}
$self->_print_GenBank_FTHelper($fth);
}
}
}
if( $seq->length == 0 ) { $self->_show_dna(0) }
if( $self->_show_dna() == 0 ) {
$self->_print("\n//\n");
return;
}
$str =~ tr/A-Z/a-z/;
unless( $mol eq 'protein' ) {
my $alen = $str =~ tr/a/a/;
my $clen = $str =~ tr/c/c/;
my $glen = $str =~ tr/g/g/;
my $tlen = $str =~ tr/t/t/;
my $olen = $len - ($alen + $tlen + $clen + $glen);
if( $olen < 0 ) {
$self->warn("Weird. More atgc than bases. Problem!");
}
my $base_count = sprintf("BASE COUNT %8s a %6s c %6s g %6s t%s\n",
$alen,$clen,$glen,$tlen,
( $olen > 0 ) ? sprintf("%6s others",$olen) : '');
$self->_print($base_count);
}
$self->_print(sprintf("ORIGIN%6s\n",''));
my $di;
my @seqline;
for ($i = 0; $i < length($str); $i += 10) {
$di=$i+11;
if ($i==0) {
$self->_print(sprintf("%9d ",1));
}
push @seqline, substr($str,$i,10);
if(($i+10)%60 == 0) {
$self->_print(join(' ', @seqline), "\n");
$self->_print(sprintf("%9d ",$di));
@seqline = ();
}
}
$self->_print(join(' ', @seqline), ' ') if( @seqline );
$self->_print("\n//\n");
$self->flush if $self->_flush_on_write && defined $self->_fh;
return 1;
}} |
sub _print_GenBank_FTHelper
{ my ($self,$fth,$always_quote) = @_;
if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
$fth->warn("$fth is not a FTHelper class. Attempting to print, but there could be tears!");
}
if( defined $fth->key &&
$fth->key eq 'CONTIG' ) {
$self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
' 'x12,$fth->loc,"\,\|\$",80);
} else {
$self->_write_line_GenBank_regex(sprintf(" %-16s",$fth->key),
" "x21,
$fth->loc,"\,\|\$",80);
}
if( !defined $always_quote) { $always_quote = 0; }
foreach my $tag ( keys %{$fth->field} ) {
foreach my $value ( @{$fth->field->{$tag}} ) {
$value =~ s/\"/\"\"/g;
if ($value eq "_no_value") {
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag","\.\|\$",80);
}
elsif( $always_quote == 1 || $value !~ /^\d+$/ ) {
my ($pat) = ($value =~ /\s/ ? '\s|$' : '.|$');
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag=\"$value\"",$pat,80);
} else {
$self->_write_line_GenBank_regex(" "x21,
" "x21,
"/$tag=$value","\.\|\$",80);
}
}
}} |
sub _read_GenBank_References
{ my ($self,$buffer) = @_;
my (@refs);
my $ref;
if( $$buffer !~ /^REFERENCE/ ) {
warn("Not parsing line '$$buffer' which maybe important");
}
$_ = $$buffer;
my (@title,@loc,@authors,@com,@medline,@pubmed);
while( defined($_) || defined($_ = $self->_readline) ) {
if (/^ AUTHORS\s+(.*)/) {
push (@authors, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push (@authors, $1);next;};
last;
}
$ref->authors(join(' ', @authors));
}
if (/^ TITLE\s+(.*)/) {
push (@title, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push (@title, $1);
next;
};
last;
}
$ref->title(join(' ', @title));
}
if (/^ JOURNAL\s+(.*)/) {
push(@loc, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push(@loc, $1);
next;
};
last;
}
$ref->location(join(' ', @loc));
}
if (/^ REMARK\s+(.*)/) {
push (@com, $1);
while ( defined($_ = $self->_readline) ) {
/^\s{3,}(.*)/ && do { push(@com, $1);
next;
};
last;
}
$ref->comment(join(' ', @com));
}
if( /^ MEDLINE\s+(.*)/ ) {
push(@medline,$1);
while ( defined($_ = $self->_readline) ) {
/^\s{4,}(.*)/ && do { push(@medline, $1);
next;
};
last;
}
$ref->medline(join(' ', @medline));
}
if( /^ PUBMED\s+(.*)/ ) {
push(@pubmed,$1);
while ( defined($_ = $self->_readline) ) {
/^\s{5,}(.*)/ && do { push(@pubmed, $1);
next;
};
last;
}
$ref->pubmed(join(' ', @pubmed));
}
/^REFERENCE/ && do {
$self->_add_ref_to_array(\@refs,$ref) if $ref;
@authors = ();
@title = ();
@loc = ();
@com = ();
@pubmed = ();
@medline = ();
$ref = Bio::Annotation::Reference->new();
if (/^REFERENCE\s+\d+\s+\([a-z]+ (\d+) to (\d+)/){
$ref->start($1);
$ref->end($2);
}
};
/^(FEATURES)|(COMMENT)/ && last;
$_ = undef; }
$self->_add_ref_to_array(\@refs,$ref) if $ref;
$$buffer = $_;
return @refs; } |
sub _add_ref_to_array
{ my ($self, $refs, $ref) = @_;
my $au = $ref->authors();
my $title = $ref->title();
$au =~ s/;\s*$//g if $au;
$title =~ s/;\s*$//g if $title;
$ref->authors($au);
$ref->title($title);
push(@{$refs}, $ref);} |
sub _read_GenBank_Species
{ my( $self,$buffer) = @_;
my @organell_names = ("chloroplast", "mitochondr");
$_ = $$buffer;
my( $sub_species, $species, $genus, $common, $organelle, @class );
while (defined($_) || defined($_ = $self->_readline())) {
s/<[^>]+>//g;
if (/^SOURCE\s+(.*)/) {
$common = $1;
$common =~ s/\.$//; } elsif (/^\s+ORGANISM/) {
my @spflds = split(' ', $_);
shift(@spflds); if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) {
$organelle = shift(@spflds);
}
$genus = shift(@spflds);
if(@spflds) {
$species = shift(@spflds);
} else {
$species = "sp.";
}
$sub_species = shift(@spflds) if(@spflds);
} elsif (/^\s+(.+)/) {
push(@class, map { s/^\s+//; s/\s+$//; $_; } split /[;\.]+/, $1);
} else {
last;
}
$_ = undef; }
$$buffer = $_;
return unless $genus and $genus !~ /^(Unknown|None)$/i;
if ($class[$#class] eq $genus) {
push( @class, $species );
} else {
push( @class, $genus, $species );
}
@class = reverse @class;
my $make = Bio::Species->new();
$make->classification(\@ class, "FORCE" ); $make->common_name( $common ) if $common;
$make->sub_species( $sub_species ) if $sub_species;
$make->organelle($organelle) if $organelle;
return $make;} |
sub _read_FTHelper_GenBank
{ my ($self,$buffer) = @_;
my ($key, $loc );
my @qual = ();
if ($$buffer =~ /^ (\S+)\s+(.+?)\s*$/o) {
$key = $1;
$loc = $2;
while ( defined($_ = $self->_readline) ) {
if (/^(\s+)(.+?)\s*$/o) {
if (length($1) > 6) {
if (@qual || (index($2,'/') == 0)) {
push(@qual, $2);
}
else {
$loc .= $2;
}
} else {
last;
}
} else {
last;
}
}
} else {
$self->debug("no feature key!\n");
$$buffer = $self->_readline();
return;
}
$$buffer = $_;
my $out = new Bio::SeqIO::FTHelper();
$out->verbose($self->verbose());
$out->key($key);
$out->loc($loc);
QUAL: for (my $i = 0; $i < @qual; $i++) {
$_ = $qual[$i];
my( $qualifier, $value ) = (m{^/([^=]+)(?:=(.+))?})
or $self->warn("cannot see new qualifier in feature $key: ".
$qual[$i]);
$qualifier = '' unless( defined $qualifier);
if (defined $value) {
if (substr($value, 0, 1) eq '"') {
while ($value !~ /\"$/ or $value =~ tr/"/"/ % 2) {
if($i >= $#qual) {
$self->warn("Unbalanced quote in:\n" .
join('', map("$_\n", @qual)) .
"No further qualifiers will " .
"be added for this feature");
last QUAL;
}
$i++; my $next = $qual[$i];
if(($value.$next) =~ /[^A-Za-z"-]/) {
$value .= " ";
}
$value .= $next;
}
$value =~ s/^"|"$//g;
$value =~ s/""/\"/g;
}
} else {
$value = '_no_value';
}
$out->field->{$qualifier} ||= [];
push(@{$out->field->{$qualifier}},$value);
}
return $out;} |
sub _write_line_GenBank
{ my ($self,$pre1,$pre2,$line,$length) = @_;
$length || $self->throw("Miscalled write_line_GenBank without length. Programming error!");
my $subl = $length - length $pre2;
my $linel = length $line;
my $i;
my $sub = substr($line,0,$length - length $pre1);
$self->_print("$pre1$sub\n");
for($i= ($length - length $pre1);$i < $linel;) {
$sub = substr($line,$i,($subl));
$self->_print("$pre2$sub\n");
$i += $subl;
}} |
sub _write_line_GenBank_regex
{ my ($self,$pre1,$pre2,$line,$regex,$length) = @_;
$length || $self->throw( "Miscalled write_line_GenBank without length. Programming error!");
if( length $pre1 != length $pre2 ) {
$self->throw( "Programming error - cannot called write_line_GenBank_regex with different length pre1 and pre2 tags!");
}
my $subl = $length - (length $pre1) - 2;
my @lines = ();
CHUNK: while($line) {
foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) {
if($line =~ m/^(.{1,$subl})($pat)(.*)/) {
$line = $3;
my $l = $1.$2;
$l =~ s/\s+$//;
push(@lines, $l);
next CHUNK;
}
}
$self->warn("trouble dissecting\" $line\" into chunks ".
"of $subl chars or less - this tag won't print right");
$line = substr($line,0,$subl) . " " . substr($line,$subl);
}
my $s = shift @lines;
$self->_print("$pre1$s\n");
foreach my $s ( @lines ) {
$self->_print("$pre2$s\n");
} } |
sub _post_sort
{ my ($obj,$value) = @_;
if( defined $value) {
$obj->{'_post_sort'} = $value;
}
return $obj->{'_post_sort'};} |
sub _show_dna
{ my ($obj,$value) = @_;
if( defined $value) {
$obj->{'_show_dna'} = $value;
}
return $obj->{'_show_dna'};} |
sub _id_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_id_generation_func'} = $value;
}
return $obj->{'_id_generation_func'};} |
sub _ac_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_ac_generation_func'} = $value;
}
return $obj->{'_ac_generation_func'};} |
sub _sv_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_sv_generation_func'} = $value;
}
return $obj->{'_sv_generation_func'};} |
sub _kw_generation_func
{ my ($obj,$value) = @_;
if( defined $value ) {
$obj->{'_kw_generation_func'} = $value;
}
return $obj->{'_kw_generation_func'};} |
General documentation
User feedback is an integral part of the evolution of this
and other Bioperl modules. Send your comments and suggestions preferably
to one of the Bioperl mailing lists.
Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://www.bioperl.org/MailList.shtml - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.
Bug reports can be submitted via email or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _