Bio::LiveSeq::IO SRS
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Summary
Bio::LiveSeq::IO::SRS - Loader for LiveSeq from EMBL entries with SRS
Package variables
Globals (from "use vars" definitions)
$VERSION = 2.4
Included modules
Bio::LiveSeq::IO::Loader 2 .2
Bio::Tools::CodonTable
Carp qw ( cluck croak carp )
lib $ENV { SRSEXE }
srsperl
Inherit
Bio::LiveSeq::IO::Loader
Synopsis
  my $db="EMBL";
  my $acc_id="M20132";
  my $query="embl-acc:$acc_id";

  my $loader=Bio::LiveSeq::IO::SRS->load(-db=>"EMBL", -query=>"$query");

  my @translationobjects=$loader->entry2liveseq();

  my $gene="AR";
  my $gene=$loader->gene2liveseq("gene");

  NOTE: The only -db now supported is EMBL. Hence it defaults to EMBL.
Description
This package uses the SRS (Sequence Retrieval System) to fetch a sequence
database entry, analyse it and create LiveSeq objects out of it.
An embl-acc ID has to be passed to this package which will return references
to all translation objects created from the EMBL entry.
References to Transcription, DNA and Exon objects can all be retrieved departing
from these.
Alternatively, a specific "gene" name can be specified, together with the
embl-acc ID. This will create a LiveSeq::Gene object with all relevant gene
features attached/created.
Methods
loadDescriptionCode
embl2hashDescriptionCode
location2range
No description
Code
cdslocation2transcript
No description
Code
joinedlocation2range
No description
Code
get_swisshashDescriptionCode
swissprot2hashDescriptionCode
novelaasequence2geneDescriptionCode
Methods description
loadcode    nextTop
  Title   : load
  Usage   : my $acc_id="M20132";
            my $query="embl-acc:$acc_id";
            $loader=Bio::LiveSeq::IO::SRS->load(-db=>"EMBL", -query=>"$query");

  Function: loads an entry with SRS from a database into a hash
  Returns : reference to a new object of class IO::SRS holding an entry
  Errorcode 0
  Args    : an SRS query resulting in one fetched EMBL (by default) entry
embl2hashcodeprevnextTop
  Title   : embl2hash
  Function: retrieves with SRS an EMBL entry, parses it and creates
            a hash that contains all the information.
  Returns : a reference to a hash
  Errorcode: 0
  Args    : an SRS query resulting in one fetched EMBL entry
              i.e. a string in a form like "embl-acc:accession_number"
	    two array references to skip features and qualifiers (for
	    performance)
  Example: @valid_features=qw(CDS exon prim_transcript mRNA);
           @valid_qualifiers=qw(gene codon_start db_xref product rpt_family);
           $hashref=&embl2hash("$query",\@valid_features,\@valid_qualifiers);
get_swisshashcodeprevnextTop
  Title   : get_swisshash
  Usage   : $loader->get_swisshash();
  Example : $swisshash=$loader->swissprot2hash("SWISS-PROT:P10275")
  Function: retrieves with SRS a SwissProt entry, parses it and creates
            a hash that contains all the information.
  Returns : a reference to a hash
  Errorcode: 0
  Args    : the db_xref qualifier's value from an EMBL CDS Feature
            i.e. a string in the form "SWISS-PROT:accession_number"
  Note    : this can be modified (adding a second argument) to retrieve
            and parse SWTREMBL, SWALL... entries
swissprot2hashcodeprevnextTop
  Title   : swissprot2hash
  Usage   : $loader->swissprot2hash();
  Example : $swisshash=$loader->swissprot2hash("swissprot-acc:P10275")
  Function: retrieves with SRS a SwissProt entry, parses it and creates
            a hash that contains all the information.
  Returns : a reference to a hash
  Errorcode: 0
  Args    : an SRS query resulting in one entry from SwissProt database
            i.e. a string in the form "swissprot-acc:accession_number"
  Note    : this can be modified (adding a second argument) to retrieve
            and parse SWTREMBL, SWALL... entries
novelaasequence2genecodeprevnextTop
  Title   : novelaasequence2gene
  Usage   : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*");
          : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*",
                                             -genome => "Homo sapiens");
          : $gene=Bio::LiveSeq::IO::SRS->novelaasequence2gene(-aasequence => "MGLAAPTRS*",
                                             -genome => "Mitochondrion Homo sapiens",
                                             -gene_name => "tyr-kinase");

  Function: creates LiveSeq objects from a novel amino acid sequence,
            using codon usage database to choose codons according to
            relative frequencies.
            If a genome latin name is not specified, the default is to use
            'Homo sapiens' (taxonomy ID 9606).
  Returns : reference to a Gene object containing references to LiveSeq objects
  Errorcode 0
  Args    : string containing an amino acid sequence
            string (optional) with a species/genome latin name
            string specifying a gene name
  Note    : SRS access to TAXON and CODONUSAGE databases is required
Methods code
loaddescriptionprevnextTop
sub load {
  my ($thing, %args) = @_;
  my $class = ref($thing) || $thing;
  my ($obj,%loader);

  my ($db,$query)=($args{-db},$args{-query});

  if (defined($db)) {
    unless ($db eq "EMBL") {
      carp "Note: only EMBL now supported!";
      return(0);
    }
  } else {
    $db="EMBL";
  }

  my $hashref;
  if ($db eq "EMBL") {
    my $test_transl=0; # change to 0 to avoid comparison of translation
# these can be changed for future needs
my @embl_valid_feature_names=qw(CDS exon prim_transcript intron repeat_unit repeat_region mRNA); my @embl_valid_qual_names=qw(gene codon_start db_xref product note number rpt_family transl_table); # dunno yet how to implement test_transl again....
# probably on a one-on-one basis with each translation?
if ($test_transl) { push (@embl_valid_qual_names,"translation"); # needed for test_transl
} # not to have the whole program die because of SRS death
eval { $hashref=&embl2hash("$query",\@embl_valid_feature_names,\@embl_valid_qual_names); }; my $errormsg; if ($@) { foreach $errormsg (split(/\n/,$@)) { if (index($errormsg,"in cleanup") == -1) { carp "SRS EMBL Loader: $@"; } } } } unless ($hashref) { return (0); } %loader = (db => $db, query => $query, hash => $hashref); $obj =\% loader; $obj = bless $obj, $class; return $obj;
}
embl2hashdescriptionprevnextTop
sub embl2hash {
  my ($i,$j);
  my $query=$_[0];
  my %valid_features; my %valid_names;
  if ($_[1]) {
    %valid_features = map {$_, 1} @{$_[1]}; # to skip features
} if ($_[2]) { %valid_names = map {$_, 1} @{$_[2]}; # to skip qualifiers
} # SRS API used to fetch all relevant fields
my $sess = new Session; my $set = $sess->query("[$query]", ""); my $numEntries=$set->size(); if ($numEntries < 1) { carp "No sequence entry found for the query $query"; return (0); } elsif ($numEntries > 1) { carp "Too many entries found for the input given"; return (0); } else { my $entry = $set->getEntry(0); my ($entry_ID,$entry_AccNumber,$entry_Molecule,$entry_Description,$entry_Organism,$entry_SeqLength,$entry_Sequence,$entry_Division); # Fetch what we can fetch without the loader
$entry_ID = $entry->fieldToString("id","","",""); $entry_AccNumber = $entry->fieldToString("acc","","",""); $entry_Molecule = $entry->fieldToString("mol","","",""); $entry_Division = $entry->fieldToString("div","","",""); $entry_Description = $entry->fieldToString("des","","",""); $entry_Description =~ s/\n/ /g; $entry_Organism = $entry->fieldToString("org","","",""); $entry_SeqLength = $entry->fieldToString("sl","","",""); # Now use the loader
my $loadedentry = $entry->load("EMBL"); # Fetch the rest via the loader
$entry_Sequence = $loadedentry->attrStr("sequence"); $entry_Sequence =~ s/\n//g; # from plain format to raw string
# put into the hash
my %entryhash; $entryhash{ID}=$entry_ID; $entryhash{AccNumber}=$entry_AccNumber; $entryhash{Molecule}=$entry_Molecule; $entryhash{Division}=$entry_Division; $entryhash{Description}=$entry_Description; $entryhash{Organism}=$entry_Organism; $entryhash{Sequence}=$entry_Sequence; $entryhash{SeqLength}=$entry_SeqLength; # create features array
my $features = $loadedentry->attrObjList("features"); my $featuresnumber= $features->size(); $entryhash{FeaturesNumber}=$featuresnumber; my ($feature,$feature_name,$feature_location); my ($feature_qual_names,$feature_qual_values); my ($feature_qual_name,$feature_qual_value); my ($feature_qual_number,$feature_qual_number2); my @features; for $i (0..$featuresnumber-1) { my %feature; $feature = $features->get($i); $feature_name = $feature->attrStr("FtKey"); #unless ($feature_name eq "CDS") {
unless ($valid_features{$feature_name}) { #print "not valid feature: $feature_name\n";
next; } #print "now processing feature: $feature_name\n";
$feature_location = $feature->attrStr("FtLocation"); $feature_location =~ s/\n//g; $feature_qual_names= $feature->attrStrList("FtQualNames"); $feature_qual_values= $feature->attrStrList("FtQualValues"); $feature_qual_number= $feature_qual_names->size(); $feature_qual_number2= $feature_qual_values->size(); # paranoia check
if ($feature_qual_number > $feature_qual_number2) { carp ("Warning with Feature at position $i ($feature_name): There are MORE qualifier names than qualifier values."); # this can happen, e.g. "/partial", let's move on, just issue a warning
#return (0);
} elsif ($feature_qual_number < $feature_qual_number2) { carp ("Error with Feature at position $i ($feature_name): There are LESS qualifier names than qualifier values. Stopped"); return (0); } #} else {print "NUMBER OF QUALIFIERS: $feature_qual_number \n";} # DEBUG
# Put things inside hash
$feature{position}=$i; $feature{name}=$feature_name; # new range field in hash
my @range; if ($feature_name eq "CDS") { @range=cdslocation2transcript($feature_location); $feature{locationtype}="joined"; } else { if (index($feature_location,"join") == -1) { @range=location2range($feature_location); } else { # complex location in feature different than CDS
@range=joinedlocation2range($feature_location); $feature{locationtype}="joined"; } } $feature{range}=\@range; $feature{location}="deprecated"; my %feature_qualifiers; for $j (0..$feature_qual_number-1) { $feature_qual_name=$feature_qual_names->get($j); $feature_qual_name =~ s/^\///; # eliminates heading "/"
# skip all not interesting (for now) qualifiers
unless ($valid_names{$feature_qual_name}) { #print "not valid name: $feature_qual_name\n";
next; } #print "now processing: $feature_qual_name\n";
$feature_qual_value=$feature_qual_values->get($j); $feature_qual_value =~ s/^"//; # eliminates heading "
$feature_qual_value =~ s/"$//; # eliminates trailing "
$feature_qual_value =~ s/\n//g; # eliminates all new lines
$feature_qualifiers{$feature_qual_name}=$feature_qual_value; } $feature{qualifiers}=\%feature_qualifiers; push (@features,\%feature); # array of features
} $entryhash{Features}=\@features; # put this also into the hash
my @cds; # array just of CDSs
for $i (0..$#features) { if ($features[$i]->{'name'} eq "CDS") { push(@cds,$features[$i]); } } $entryhash{CDS}=\@cds; # put this also into the hash
return (\%entryhash); }
}
location2rangedescriptionprevnextTop
sub location2range {
  my ($location)=@_;
  my ($start,$end,$strand);
  if (index($location,"complement") == -1) { # forward strand
$strand=1; } else { # reverse strand
$location =~ s/complement\(//g; $location =~ s/\)//g; $strand=-1; } $location =~ s/\<//g; $location =~ s/\>//g; my @range=split(/\.\./,$location); if (scalar(@range) == 1) { # special case of range with just one position (e.g. polyA_site EMBL features
$start=$end=$range[0]; } else { if ($strand == 1) { ($start,$end)=@range; } else { # reverse strand
($end,$start)=@range; } } return ($start,$end,$strand);
}
cdslocation2transcriptdescriptionprevnextTop
sub cdslocation2transcript {
  my ($location)=@_;
  my @exonlocs;
  my $exonloc;
  my @exon;
  my @transcript=();
  $location =~ s/^join\(//;
  $location =~ s/\)$//;
  @exonlocs = split (/\,/,$location);
  for $exonloc (@exonlocs) {
    my @exon=location2range($exonloc);
    push (@transcript,\@exon);
  }
  return (@transcript);
}
joinedlocation2rangedescriptionprevnextTop
sub joinedlocation2range {
  &cdslocation2transcript;
}
get_swisshashdescriptionprevnextTop
sub get_swisshash {
  my ($self,$query)=@_;
  if (index($query,"SWISS-PROT") == -1) {
    return (0);
  }
  $query =~ s/SWISS-PROT/swissprot-acc/;
  my $hashref;
  eval {
    $hashref=&swissprot2hash("$query");
  };
  my $errormsg;
  if ($@) {
    foreach $errormsg (split(/\n/,$@)) {
      if (index($errormsg,"in cleanup") == -1) {
	carp "SRS Swissprot Loader: $@";
      }
    }
  }
  unless ($hashref) {
    return (0);
  } else {
    return ($hashref);
  }
}
swissprot2hashdescriptionprevnextTop
sub swissprot2hash {
  my ($i,$j);
  my $query=$_[0];
  # SRS API used to fetch all relevant fields
my $sess = new Session; my $set = $sess->query("[$query]", ""); my $numEntries = $set->size(); if ($numEntries < 1) { carp "No sequence entry found for the query $query"; return (0); } elsif ($numEntries > 1) { carp "Too many entries found for the input given"; return (0); } else { my $entry = $set->getEntry(0); my ($entry_ID,$entry_AccNumber,$entry_Molecule,$entry_Description,$entry_Organism,$entry_SeqLength,$entry_Sequence,$entry_Gene); # Fetch what we can fetch without the loader
$entry_ID = $entry->fieldToString("id","","",""); $entry_AccNumber = $entry->fieldToString("acc","","",""); $entry_Gene = $entry->fieldToString("gen","","",""); $entry_Gene =~ s/\n/ /g; $entry_Description = $entry->fieldToString("des","","",""); $entry_Description =~ s/\n/ /g; $entry_Organism = $entry->fieldToString("org","","",""); chop $entry_Organism; $entry_SeqLength = $entry->fieldToString("sl","","",""); # Now use the loader
my $loadedentry = $entry->load("Swissprot"); # Fetch the rest via the loader
$entry_Sequence = $loadedentry->attrStr("Sequence"); $entry_Sequence =~ s/\n//g; # from plain format to raw string
# put into the hash
my %entryhash; $entryhash{ID}=$entry_ID; $entryhash{AccNumber}=$entry_AccNumber; $entryhash{Molecule}=$entry_Molecule; $entryhash{Gene}=$entry_Gene; $entryhash{Description}=$entry_Description; $entryhash{Organism}=$entry_Organism; $entryhash{Sequence}=$entry_Sequence; $entryhash{SeqLength}=$entry_SeqLength; # create features array
my $features = $loadedentry->attrObjList("Features"); my $featuresnumber= $features->size(); $entryhash{FeaturesNumber}=$featuresnumber; my ($feature,$feature_name,$feature_description,$feature_location); my @features; for $i (0..$featuresnumber-1) { my %feature; $feature = $features->get($i); $feature_name = $feature->attrStr("FtKey"); $feature_location = $feature->attrStr("FtLocation"); $feature_location =~ s/ +/ /g; $feature_description = $feature->attrStr("FtDescription"); chop $feature_description; $feature_description =~ s/\nFT / /g; # Put things inside hash
$feature{position}=$i; $feature{name}=$feature_name; $feature{location}=$feature_location; $feature{description}=$feature_description; push (@features,\%feature); # array of features
} $entryhash{Features}=\@features; # put this also into the hash
return (\%entryhash); }
}
novelaasequence2genedescriptionprevnextTop
sub novelaasequence2gene {
  my ($self, %args) = @_;
  my ($gene_name,$species_name,$aasequence)=($args{-gene_name},$args{-genome},$args{-aasequence});
  unless ($aasequence) {
    carp "aasequence not given";
    return (0);
  }
  unless ($gene_name) {
    $gene_name="Novel Unknown";
  }
  unless ($species_name) {
    $species_name="Homo sapiens";
  }
  
  my $sess = new Session;
  my ($e,$numEntries,$set);

  # codonusage lookup
my $codonusage_usage; my @species_codon_usage; $set = $sess->query("[codonusage:'$species_name']", ""); $numEntries = $set->size(); if ($numEntries > 0) { $e = $set->getEntry(0); $species_name = $e->fieldToString("id","","",""); $codonusage_usage = $e->fieldToString("usage","","",""); @species_codon_usage=split(/\s/,$codonusage_usage); # spaces or tabs
if ($numEntries > 1) { carp "Search in Codon Usage DB resulted in $numEntries results. Taking the first one: $species_name"; } } else { carp "Genome not found in codon usage DB."; return (0); } # taxonomy lookup
my $mito_flag = 0; my $species_origin; if ($species_name =~ /^Mitochondrion /) { $mito_flag = 1; $species_origin = $species_name; $species_origin =~ s/^Mitochondrion //; $set = $sess->query("[taxonomy-species:'$species_origin']", ""); } elsif ($species_name =~ /^Chloroplast |^Kinetoplast |^Chromoplast /) { $species_origin = $species_name; $species_origin =~ s/^Chromoplast //; $species_origin =~ s/^Kinetoplast //; $species_origin =~ s/^Chloroplast //; $set = $sess->query("[taxonomy-species:'$species_origin']", ""); } else { $set = $sess->query("[taxonomy-species:'$species_name']", ""); } $numEntries = $set->size(); my ($taxonomy_id,$taxonomy_gc,$taxonomy_mgc,$taxonomy_name); if ($numEntries > 0) { $e = $set->getEntry(0); $taxonomy_id = $e->fieldToString("id","","",""); $taxonomy_name = $e->fieldToString("species","","",""); $taxonomy_gc = $e->fieldToString("gc","","",""); $taxonomy_mgc = $e->fieldToString("mgc","","",""); if ($numEntries > 1) { carp "Note! More than one entry found in taxonomy DB for the genome query given. Using the first one: $taxonomy_name ($taxonomy_id)"; } } else { carp "Genome not found in taxonomy DB."; return (0); } # Lookup appropriate translation table
my $ttabid; if ($mito_flag) { $ttabid = $taxonomy_mgc; } else { $ttabid = $taxonomy_gc; } my $gene=Bio::LiveSeq::IO::Loader::_common_novelaasequence2gene(\@species_codon_usage,$ttabid,$aasequence,$gene_name); return ($gene);
}
General documentation
AUTHOR - Joseph A.L. InsanaTop
Email: Insana@ebi.ac.uk, jinsana@gmx.net
Address:
     EMBL Outstation, European Bioinformatics Institute
     Wellcome Trust Genome Campus, Hinxton
     Cambs. CB10 1SD, United Kingdom
APPENDIXTop
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _