Bio::DB EUtilities
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Summary
Bio::DB::EUtilities - interface for handling web queries and data
retrieval from Entrez Utilities at NCBI.
Package variables
No package variables defined.
Included modules
URI
Inherit
Bio::DB::GenericWebDBI
Synopsis
use Bio::DB::EUtilities;
  my $esearch = Bio::DB::EUtilities->new(-eutil      => 'esearch',
-db => 'pubmed',
-term => 'hutP',
-usehistory => 'y');
$esearch->get_response; # parse the response, fetch a cookie my $elink = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'protein', -dbfrom => 'pubmed', -cookie => $esearch->next_cookie, -cmd => 'neighbor_history'); $elink->get_response; # parse the response, fetch the next cookie my $efetch = Bio::DB::EUtilities->new(-cookie => $elink->next_cookie, -retmax => 10, -rettype => 'fasta'); print $efetch->get_response->content;
Description
WARNING: Please do NOT spam the Entrez web server with multiple requests.
NCBI offers Batch Entrez for this purpose, now accessible here via epost!
This is a test interface to the Entrez Utilities at NCBI. The main purpose of this
is to enable access to all of the NCBI databases available through Entrez and
allow for more complex queries. It is likely that the API for this module as
well as the documentation will change dramatically over time. So, novice users
and neophytes beware!
The experimental base class is Bio::DB::GenericWebDBI,
which as the name implies enables access to any web database which will accept
parameters. This was originally born from an idea to replace
WebDBSeqI/NCBIHelper with a more general web database accession tool so one
could access sequence information, taxonomy, SNP, PubMed, and so on.
However, this may ultimately prove to be better used as a replacement for
LWP::UserAgent when ccessing NCBI-related web tools
(Entrez Utilitites, or EUtilities). Using the base class GenericWebDBI,
one could also build web interfaces to other databases to access anything
via CGI parameters.
Currently, you can access any database available through the NCBI interface:
  http://eutils.ncbi.nlm.nih.gov/
At this point, Bio::DB::EUtilities uses the EUtilities plugin modules somewhat
like Bio::SeqIO. So, one would call the particular EUtility (epost, efetch,
and so forth) upon instantiating the object using a set of parameters:
  my $esearch = Bio::DB::EUtilities->new(-eutil      => 'esearch',
-db => 'pubmed',
-term => 'dihydroorotase',
-usehistory => 'y');
The default EUtility (when eutil is left out) is 'efetch'. For specifics on
each EUtility, see their respective POD (**these are incomplete**) or
the NCBI Entrez Utilities page:
  http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
At this time, retrieving the response is accomplished by using the method
get_response (which also parses for cookies and other information, see below).
This method returns an HTTP::Response object. The raw data is accessed by using
the object method content, like so:
  my $efetch = Bio::DB::EUtilities->new(-cookie       => $elink->next_cookie,
-retmax => 10,
-rettype => 'fasta');
print $efetch->get_response->content;
Based on this, if one wanted to retrieve sequences or other raw data
but was not interested in directly using Bio* objects (such as if
genome sequences were to be retrieved) one could do so by using the
proper EUtility object(s) and query(ies) and get the raw response back
from NCBI through 'efetch'.
A great deal of the documentation here will likely end up in the form
of a HOWTO at some future point, focusing on getting data into Bioperl
objects. Some EUtilities (epost, esearch, or elink) retain information on
the NCBI server under certain settings. This information can be retrieved by
using a cookie. Here, the idea of the 'cookie' is similar to the
'cookie' set on a your computer when browsing the Web. XML data returned
by these EUtilities, when applicable, is parsed for the cookie information
(the 'WebEnv' and 'query_key' tags to be specific) The information along
with other identifying data, such as the calling eutility, description
of query, etc.) is stored as a
Bio::DB::EUtilities::Cookie object
in an internal queue. These can be retrieved one at a time by using
the next_cookie method or all at once in an array using get_all_cookies.
Each cookie can then be 'fed', one at a time, to another EUtility object,
thus enabling chained queries as demonstrated in the synopsis.
For more information, see the POD documentation for
Bio::DB::EUtilities::Cookie.
Methods
BEGIN Code
new
No description
Code
_initialize
No description
Code
add_cookieDescriptionCode
next_cookieDescriptionCode
reset_cookiesDescriptionCode
get_all_cookiesDescriptionCode
get_cookie_countDescriptionCode
rewind_cookiesDescriptionCode
keep_cookiesDescriptionCode
parse_responseDescriptionCode
get_responseDescriptionCode
get_idsDescriptionCode
delay_policyDescriptionCode
get_entrezdbsDescriptionCode
_add_db_ids
No description
Code
_eutilDescriptionCode
_submit_request
No description
Code
_get_params
No description
Code
_load_eutil_module
No description
Code
_next_cookie_index
No description
Code
Methods description
add_cookiecode    nextTop
 Title   : cookie
Usage : $db->add_cookie($cookie)
Function: adds an NCBI query cookie to the internal cookie queue
Returns : none
Args : a Bio::DB::EUtilities::Cookie object
next_cookiecodeprevnextTop
 Title   : next_cookie
Usage : $cookie = $db->next_cookie
Function: return a cookie from the internal cookie queue
Returns : a Bio::DB::EUtilities::Cookie object
Args : none
reset_cookiescodeprevnextTop
 Title   : reset_cookies
Usage : $db->reset_cookies
Function: resets (empties) the internal cookie queue
Returns : none
Args : none
get_all_cookiescodeprevnextTop
 Title   : get_all_cookies
Usage : @cookies = $db->get_all_cookies
Function: retrieves all cookies from the internal cookie queue; this leaves
the cookies in the queue intact
Returns : array of cookies (if wantarray) of first cookie
Args : none
get_cookie_countcodeprevnextTop
 Title   : get_cookie_count
Usage : $ct = $db->get_cookie_count
Function: returns # cookies in internal queue
Returns : integer
Args : none
rewind_cookiescodeprevnextTop
 Title   : rewind_cookies
Usage : $elink->rewind_cookies;
Function: resets cookie index to 0 (starts over)
Returns : None
Args : None
keep_cookiescodeprevnextTop
 Title   : keep_cookies
Usage : $db->keep_cookie(1)
Function: Flag to retain the internal cookie queue;
this is normally emptied upon using get_response
Returns : none
Args : Boolean - value that evaluates to TRUE or FALSE
parse_responsecodeprevnextTop
 Title   : parse_response
Usage : $db->_parse_response($content)
Function: parse out response for cookies and other goodies
Returns : empty
Args : none
Throws : Not implemented (implemented in plugin classes)
get_responsecodeprevnextTop
 Title   : get_response
Usage : $db->get_response($content)
Function: main method to submit request and retrieves a response
Returns : HTTP::Response object
Args : None
get_idscodeprevnextTop
 Title   : get_ids
Usage : $count = $elink->get_ids($db); # array ref of specific db ids
@ids = $esearch->get_ids(); # array
$ids = $esearch->get_ids(); # array ref
Function: returns an array or array ref of unique IDs.
Returns : array or array ref of ids
Args : Optional : database string if elink used (required arg if searching
multiple databases for related IDs)
Currently implemented only for elink object with single linksets
delay_policycodeprevnextTop
  Title   : delay_policy
Usage : $secs = $self->delay_policy
Function: return number of seconds to delay between calls to remote db
Returns : number of seconds to delay
Args : none
NOTE: NCBI requests a delay of 3 seconds between requests. This method implements that policy.
get_entrezdbscodeprevnextTop
  Title   : get_entrezdbs
Usage : @dbs = $self->get_entrezdbs;
Function: return list of all Entrez databases; convenience method
Returns : array or array ref (based on wantarray) of databases
Args : none
_eutilcodeprevnextTop
 Title   : _eutil
Usage : $db->_eutil;
Function: sets eutil
Returns : eutil
Args : eutil
Methods code
BEGINTop
BEGIN {
    our @METHODS = qw(rettype usehistory term field tool reldate mindate
        maxdate datetype retstart retmax sort seq_start seq_stop strand
        complexity report dbfrom cmd holding version linkname);
    for my $method (@METHODS) {
        eval <<END
sub $method {
my \$self = shift;
return \$self->{'_$method'} = shift if \@_;
return \$self->{'_$method'};
}
END
}
newdescriptionprevnextTop
sub new {
    my($class,@args) = @_;
    if( $class =~ /Bio::DB::EUtilities::(\S+)/ ) {
        my ($self) = $class->SUPER::new(@args);
        $self->_initialize(@args);
        return $self;
    } else { 
        my %param = @args;
        @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
my $eutil = $param{'-eutil'} || 'efetch'; return unless ($class->_load_eutil_module($eutil)); return "Bio::DB::EUtilities::$eutil"->new(@args); }
}
_initializedescriptionprevnextTop
sub _initialize {
    my ($self, @args) = @_;
    my ( $tool, $ids, $retmode, $verbose, $cookie, $keep_cookies) =
      $self->_rearrange([qw(TOOL ID RETMODE VERBOSE COOKIE KEEP_COOKIES)],  @args);
        # hard code the base address
$self->url_base_address($HOSTBASE); $tool ||= $DEFAULT_TOOL; $self->tool($tool); $ids && $self->id($ids); $verbose && $self->verbose($verbose); $retmode && $self->retmode($retmode); $keep_cookies && $self->keep_cookies($keep_cookies); if ($cookie && ref($cookie) =~ m{cookie}i) { $self->db($cookie->database) if !($self->db); $self->add_cookie($cookie); } $self->{'_cookieindex'} = 0; $self->{'_cookiecount'} = 0; $self->{'_authentication'} = [];
}
add_cookiedescriptionprevnextTop
sub add_cookie {
    my $self = shift;
    if (@_) {
        my $cookie = shift;
        $self->throw("Expecting a Bio::DB::EUtilities::Cookie, got $cookie.")
          unless $cookie->isa("Bio::DB::EUtilities::Cookie");
        push @{$self->{'_cookie'}}, $cookie;
    }
    $self->{'_cookiecount'}++;
}
next_cookiedescriptionprevnextTop
sub next_cookie {
    my $self = shift;
    my $index = $self->_next_cookie_index;
    if ($self->{'_cookie'}) {
        return $self->{'_cookie'}->[$index];
    } else {
        $self->warn("No cookies left in the jar!");
    }
}
reset_cookiesdescriptionprevnextTop
sub reset_cookies {
    my $self = shift;
    $self->{'_cookie'} = [];
    $self->{'_cookieindex'} = 0;
    $self->{'_cookiecount'} = 0;
}
get_all_cookiesdescriptionprevnextTop
sub get_all_cookies {
    my $self = shift;
    return @{ $self->{'_cookie'} } if $self->{'_cookie'} && wantarray;
    return $self->{'_cookie'}->[0] if $self->{'_cookie'}
}
get_cookie_countdescriptionprevnextTop
sub get_cookie_count {
    my $self = shift;
    return $self->{'_cookiecount'};
}
rewind_cookiesdescriptionprevnextTop
sub rewind_cookies {
    my $self = shift;
    $self->{'_cookieindex'} = 0;
}
keep_cookiesdescriptionprevnextTop
sub keep_cookies {
    my $self = shift;
    return $self->{'_keep_cookies'} = shift if @_;
    return $self->{'_keep_cookies'};
}
parse_responsedescriptionprevnextTop
sub parse_response {
  my $self = shift;
  $self->throw_not_implemented;
}
get_responsedescriptionprevnextTop
sub get_response {
    my $self = shift;
    $self->_sleep; # institute delay policy
my $request = $self->_submit_request; if ($self->authentication) { $request->proxy_authorization_basic($self->authentication) } if (!$request->is_success) { $self->throw(ref($self)." Request Error:".$request->as_string); } $self->reset_cookies if !($self->keep_cookies); $self->parse_response($request); # grab cookies and what not
return $request;
}
get_idsdescriptionprevnextTop
sub get_ids {
    my $self = shift;
    my $user_db = shift if @_;
    if ($self->can('get_all_linksets')) {
        my $querydb = $self->db;
        if (!$user_db && ($querydb eq 'all' || $querydb =~ m{,}) ) {
            $self->throw(q(Multiple databases searched; must use a specific ).
                         q(database as an argument.) );
        }
        
        my $count = $self->get_linkset_count;
        if ($count == 0) {
            $self->throw( q(No linksets!) );
        }
        elsif ($count == 1) {
            my ($linkset) = $self->get_all_linksets;
            my ($db) = $user_db ? $user_db : $linkset->get_all_linkdbs;
            $self->_add_db_ids( scalar( $linkset->get_LinkIds_by_db($db) ) );
        }
        else {
            $self->throw( q(Multiple linkset objects present; can't use get_ids.).
                 qq(\nUse get_all_linksets/get_databases/get_LinkIds_by_db ).
                 qq(\n$count total linksets ));
        }
    }
    if ($self->{'_db_ids'}) {
        return @{$self->{'_db_ids'}} if wantarray;
        return $self->{'_db_ids'};
    }
}
delay_policydescriptionprevnextTop
sub delay_policy {
  my $self = shift;
  return 3;
}
get_entrezdbsdescriptionprevnextTop
sub get_entrezdbs {
    my $self = shift;
    my $info = Bio::DB::EUtilities->new(-eutil => 'einfo');
    $info->get_response;
    # copy list, not ref of list (so einfo obj doesn't stick around)
my @databases = $info->einfo_dbs; return @databases;
}
_add_db_idsdescriptionprevnextTop
sub _add_db_ids {
    my ($self, $ids) = @_;
    $self->throw ("IDs must be an ARRAY reference") unless ref($ids) =~ m{ARRAY}i;
    my @ids = @{ $ids}; # deep copy
$self->{'_db_ids'} =\@ ids;
}
_eutildescriptionprevnextTop
sub _eutil {
    my $self = shift;
    return $self->{'_eutil'} = shift if @_;
    return $self->{'_eutil'};
}
_submit_requestdescriptionprevnextTop
sub _submit_request {
    my $self = shift;
    my %params = $self->_get_params;
    my $eutil = $self->_eutil;
    if ($self->id) {
        # this is in case multiple id groups are present
if ($self->can('multi_id') && $self->multi_id) { # multiple id groups if groups are together in an array reference
# ids and arrays are flattened into individual groups
for my $id_group (@{ $self->id }) { if (ref($id_group) eq 'ARRAY') { push @{ $params{'id'} }, (join q(,), @{ $id_group }); } elsif (!ref($id_group)) { push @{ $params{'id'} }, $id_group; } else { $self->throw("Unknown ID type: $id_group"); } } } else { my @ids = @{ $self->id }; $params{'id'} = join ',', @ids; } } my $url = URI->new($HOSTBASE . $CGILOCATION{$eutil}[1]); $url->query_form(%params); $self->debug("The web address:\n".$url->as_string."\n"); if ($CGILOCATION{$eutil}[0] eq 'post') { # epost request
return $self->post($url); } else { # all other requests
return $self->get($url); }
}
_get_paramsdescriptionprevnextTop
sub _get_params {
    my $self = shift;
    my $cookie = $self->get_all_cookies ? $self->get_all_cookies : 0;
    my @final;  # final parameter list; this changes dep. on presence of cookie
my $eutil = $self->_eutil; my %params; @final = ($cookie && $cookie->isa("Bio::DB::EUtilities::Cookie")) ? @COOKIE_PARAMS : @PARAMS; # build parameter hash based on final parameter list
for my $method (@final) { if ($self->$method) { $params{$method} = $self->$method; } } if ($cookie) { my ($webenv, $qkey) = @{$cookie->cookie}; $self->debug("WebEnv:$webenv\tQKey:$qkey\n"); ($params{'WebEnv'}, $params{'query_key'}) = ($webenv, $qkey); $params{'dbfrom'} = $cookie->database if $eutil eq 'elink'; } my $db = $self->db; # elink cannot set the db from a cookie (it is actually dbfrom)
$params{'db'} = $db ? $db : ($cookie && $eutil ne 'elink') ? $cookie->database : 'nucleotide'; # einfo db exception (db is optional)
if (!$db && ($eutil eq 'einfo' || $eutil eq 'egquery')) { delete $params{'db'}; } unless (exists $params{'retmode'}) { # set by user
my $format = $CGILOCATION{ $eutil }[2]; # set by eutil
if ($format eq 'dbspec') { # database-specific
$format = $DATABASE{$params{'db'}} ? $DATABASE{$params{'db'}} : 'xml'; # have xml as a fallback
} $params{'retmode'} = $format; } $self->debug("Param: $_\tValue: $params{$_}\n") for keys %params; return %params;
}
_load_eutil_moduledescriptionprevnextTop
sub _load_eutil_module {
  my ($self,$eutil) = @_;
  my $module = "Bio::DB::EUtilities::" . $eutil;
  my $ok;
  
  eval {
      $ok = $self->_load_module($module);
  };
  if ( $@ ) {
      print STDERR <<END
$self: $eutil cannot be found
Exception $@
For more information about the EUtilities system please see the EUtilities docs.
This includes ways of checking for formats at compile time, not run time
END
; } return $ok;
}
_next_cookie_indexdescriptionprevnextTop
sub _next_cookie_index {
    my $self = shift;
    return $self->{'_cookieindex'}++;
}
General documentation
TODOTop
Resetting internal parameters is planned so one could feasibly reuse
the objects once instantiated, such as if one were to use this as a
replacement for LWP::UserAgent when retrieving responses i.e. when
using many of the Bio::DB* NCBI-related modules.
File and filehandle support to be added.
Switch over XML parsing in most EUtilities to XML::SAX (currently
use XML::Simple)
Any feedback is welcome.
FEEDBACKTop
Mailing ListsTop
User feedback is an integral part of the
evolution of this and other Bioperl modules. Send
your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation
is much appreciated.
  bioperl-l@lists.open-bio.org               - General discussion
http://www.bioperl.org/wiki/Mailing_lists - About the mailing lists
Reporting BugsTop
Report bugs to the Bioperl bug tracking system to
help us keep track the bugs and their resolution.
Bug reports can be submitted via the web.
  http://bugzilla.open-bio.org/
AUTHOR Top
Email cjfields at uiuc dot edu
APPENDIXTop
The rest of the documentation details each of the
object methods. Internal methods are usually
preceded with a _
Private methodsTop