Bio::DB
HIV
Summary
Bio::DB::HIV - Database object interface to the Los Alamos HIV Sequence Database
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
$db = new Bio::DB::HIV;
$seq = $db->get_Seq_by_id('94284'); # LANL sequence id
$seq = $db->get_Seq_by_acc('EF432710'); # GenBank accession
$q = new Bio::DB::Query::HIVQuery( " (C D)[subtype] SI[phenotype] (symptomatic AIDS)[patient_health] " );
$seqio = $db->get_Stream_by_query($q);
$seq = $seqio->next_seq();
($seq->annotation->get_Annotations('Virus'))[0]->{subtype} # returns 'D'
($seq->annotation->get_Annotations('Patient'))[0]->{patient_health} # returns 'AIDS'
($seq->annotation->get_Annotations('accession'))[0]->{value} # returns 'K03454'
Description
Bio::DB::HIV, along with
Bio::DB::Query::HIVQuery, provides an
interface for obtaining annotated HIV and SIV sequences from the Los
Alamos National Laboratory (LANL) HIV Sequence Database (
http://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html). Unannotated sequences can be retrieved directly from the database
object, using either LANL ids or GenBank accessions. Annotations are
obtained via a query object, and are attached to the correct Bio::Seq
objects when the query is handled by Bio::DB::HIV::get_Seq_by_query
or Bio::DB::HIV::get_Stream_by_query.
Methods
Methods description
Title : new Usage : my $obj = new Bio::DB::HIV(); Function: Builds a new Bio::DB::HIV object Returns : an instance of Bio::DB::HIV Args : |
Title : get_request Usage : my $url = $self->get_request Function: returns a HTTP::Request object Returns : Args : %qualifiers = a hash of qualifiers with keys in (-ids, -format, -mode, -query) Note : Several layers of requests are performed to get to the sequence; see Bio::DB::Query::HIVQuery. |
Title : postprocess_data Usage : $self->postprocess_data ( 'type' => 'string', 'location' => \$datastr); Function: process downloaded data before loading into a Bio::SeqIO Returns : void Args : hash with two keys - 'type' can be 'string' or 'file' - 'location' either file location or string reference containing data |
Title : get_seq_stream Usage : my $seqio = $self->get_seq_stream(%qualifiers) Function: builds a url and queries a web db Returns : a Bio::SeqIO stream capable of producing sequence Args : %qualifiers = a hash qualifiers that the implementing class will process to make a url suitable for web querying Note : Some tightening up of the baseclass version |
Title : get_Stream_by_acc Usage : $seq = $db->get_Stream_by_acc([$acc1, $acc2]); Function: Gets a series of Seq objects by GenBank accession numbers Returns : a Bio::SeqIO stream object Args : an arrayref of accession numbers for the desired sequence entries Note : For LANL DB, alternative to LANL seqids |
Title : get_Stream_by_query Usage : $stream = $db->get_Stream_by_query($query); Function: Gets a series of Seq objects by way of a query string or oject Returns : a Bio::SeqIO stream object Args : $query : Currently, only a Bio::DB::Query::HIVQuery object. It's a good idea to create the query object first and interrogate it for the entry count before you fetch a potentially large stream. |
Title : lanl_base Usage : $obj->lanl_base($newval) Function: get/set the base url of the LANL HIV database Example : Returns : value of lanl_base (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : map_db Usage : $obj->map_db($newval) Function: get/set the cgi filename for map_db ("Database Map") Example : Returns : value of map_db (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : make_search_if Usage : $obj->make_search_if($newval) Function: get/set the cgi filename for make_search_if ("Make Search Interface") Example : Returns : value of make_search_if (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : search_ Usage : $obj->search_($newval) Function: get/set the cgi filename for the search query page ("Search Database") Example : Returns : value of search_ (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : _map_db_uri Usage : Function: return the full map_db uri ("Database Map") Example : Returns : scalar string Args : none |
Title : _make_search_if_uri Usage : Function: return the full make_search_if uri ("Make Search Interface") Example : Returns : scalar string Args : none |
Title : _search_uri Usage : Function: return the full search cgi uri ("Search Database") Example : Returns : scalar string Args : none |
Title : _session_id Usage : $obj->_session_id($newval) Function: Contains HIV db session id (initialized in _do_lanl_request) Example : Returns : value of _session_id (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : _response Usage : $obj->_response($newval) Function: hold the response to search post Example : Returns : value of _response (a scalar) Args : on set, new value (a scalar or undef, optional) |
Methods code
BEGIN { $LANL_BASE = "http://www.hiv.lanl.gov/components/sequence/HIV/asearch";
$LANL_MAP_DB = "map_db.comp";
$LANL_MAKE_SEARCH_IF = "make_search_if.comp";
$LANL_SEARCH = "search.comp";
@Bio::ResponseProblem::Exception::ISA = qw( Bio::Root::Exception );
@Bio::HIVSorry::Exception::ISA = qw ( Bio::Root::Exception );
@Bio::WebError::Exception::ISA = qw( Bio::Root::Exception ); } |
sub new
{ my($class,@args) = @_;
my $self = $class->SUPER::new(@args);
my ($lanl_base, $lanl_map_db, $lanl_make_search_if, $lanl_search) =
$self->_rearrange([qw(
LANL_BASE
LANL_MAP_DB
LANL_MAKE_SEARCH_IF
LANL_SEARCH
)], @args);
$lanl_base && $self->lanl_base($lanl_base);
$lanl_map_db && $self->map_db($lanl_map_db);
$lanl_make_search_if && $self->make_search_if($lanl_make_search_if);
$lanl_search && $self->search_($lanl_search);
$self->lanl_base || $self->lanl_base($LANL_BASE);
$self->map_db || $self->map_db($LANL_MAP_DB);
$self->make_search_if || $self->make_search_if($LANL_MAKE_SEARCH_IF);
$self->search_ || $self->search_($LANL_SEARCH);
$self->url_base_address || $self->url_base_address($self->lanl_base);
$self->request_format("fasta");
return $self;} |
sub get_request
{ my $self = shift;
my %quals = @_;
my ($resp);
my (@ids, $mode, @interface, @query_parms, $query);
my $tags_re = qr{(?:\s*<[^>]+>\s*)};
my $session_id_re = qr{<input.*name="id".*value="([0-9a-f]+)"}m;
my $search_form_re = qr{<form[^>]*action=".*/search.comp"};
my $seqs_found_re = qr{Displaying$tags_re*(?:\s*[0-9-]*\s*)*$tags_re*of$tags_re*\s*([0-9]+)$tags_re*sequences found};
my $no_seqs_found_re = qr{Sorry.*no sequences found};
my $too_many_re = qr{too many records: $tags_re*([0-9]+)};
my $tbl_no_join_re = qr{tables without join}i;
foreach (keys %quals) {
m/mode/ && do { $mode = $quals{$_}; next;
};
m/uids/ && do { $self->throw(-class=>"Bio::Root::BadParameter", -text=>"Arrayref required for qualifier \"$_\"", -value=>$quals{$_}) unless ref($quals{$_}) eq 'ARRAY'; @ids = @{$quals{$_}};
next;
};
m/query/ && do { $self->throw(-class=>"Bio::Root::BadParameter", -text=>"Bio::DB::Query::HIVQuery required for qualifier \"$_\"", -value=>$quals{$_}) unless $quals{$_}->isa("Bio::DB::Query::HIVQuery"); $query = $quals{$_};
next;
};
do {
1; };
}
for my $m ($mode) {
($m =~ m/single/) && do { @interface = ( 'sequenceentry' => 'se_sequence', 'sequenceentry' => 'se_id', 'action' => 'Search Interface' ); @query_parms = map { ('sequenceentry.se_id' => $_ ) } @ids;
push @query_parms, (
'sequenceentry.se_sequence'=>'Any',
'order' => 'sequenceentry.se_id',
'sort_dir' => 'ASC',
'action' => 'Search'
);
};
($mode =~ m/acc/) && do { @interface = ( 'sequenceentry' => 'se_sequence', 'sequenceentry' => 'se_id', 'sequenceaccessions' => 'sa_genbankaccession', 'sequenceaccessions' => 'sa_se_id', 'action' => 'Search Interface' ); @query_parms = map {('sequenceaccessions.sa_genbankaccession' => $_)} @ids;
push @query_parms, (
'sequenceentry.se_sequence' => 'Any',
'order' => 'sequenceaccessions.sa_genbankaccession',
'sort_dir' => 'ASC',
'action' => 'Search'
);
};
($mode =~ m/gi/) && do { $self->_sorry("-mode=>gi"); };
($mode =~ m/version/) && do { $self->_sorry("-mode=>version"); };
($mode =~ m/query/) && do { $self->throw(-class=>"Bio::Root::BadParameter", -text=>"Query ".($query->{'_RUN_LEVEL'} ? "has been run only at run level ".$query->{'_RUN_LEVEL'} : "has not been run").", run at level 2 with _do_query(2)", -value=>$query->{'_RUN_LEVEL'}) unless $query->{'_RUN_LEVEL'} == 2; @interface = (
'sequenceentry' => 'se_sequence',
'sequenceentry' => 'se_id',
'action' => 'Search Interface'
);
@query_parms = ("sequenceentry.se_id" =>sprintf("'%s'",join("\t", $query->ids)));
push @query_parms, (
'sequenceentry.se_sequence' => 'Any',
'order' => 'sequenceentry.se_id',
'sort_dir' => 'ASC',
'action' => 'Search'
);
};
do {
1; };
}
eval { if (!$self->_session_id) {
$resp = $self->ua->get($self->_map_db_uri);
$resp->is_success || die "Connect failed";
if (!$self->_session_id) {
($self->{'_session_id'}) = ($resp->content =~ /$session_id_re/);
$self->_session_id || die "Session not established";
}
}
$resp = $self->ua->post($self->_make_search_if_uri, [@interface, id=>$self->_session_id]);
$resp->is_success || die "Interface request failed (1)";
$self->_response($resp);
$resp->content =~ /$search_form_re/ || die "Interface request failed (2)";
$resp = $self->ua()->post($self->_search_uri, [(@query_parms, 'id' => $self->_session_id)] );
unless ($resp->is_success) {
die "Search post failed";
}
$self->_response($resp);
for ($resp->content) {
/$no_seqs_found_re/ && do {
die "No sequences found";
last;
};
/$too_many_re/ && do {
die "Too many records ($1): must be <10000";
last;
};
/$tbl_no_join_re/ && do {
die "Some required tables went unjoined to query";
last;
};
/$seqs_found_re/ && do {
last;
};
do {
die "Unparsed failure";
last;
};
}
};
$self->throw(-class=>'Bio::WebError::Exception',
-text=>$@,
-value=>$resp->content) if $@;
return POST $self->_search_uri,
['action Download.x' => 1,
'action Download.y'=>1,
'id'=>$self->_session_id
];} |
sub postprocess_data
{ my ( $self, %args) = @_;
my ($type, $loc) = ($args{type}, $args{location});
my (@data, @cols, %rec, $idkey, @flines);
$self->throw(-class=>'Bio::Root::BadParameter',
-text=>"Argument hash requires values for keys\" type\" and\" location\"",
-value=>\%args) unless ($type && $loc);
for ($type) {
m/string/ && do { @data = split(/\n|\r/, ${$loc}); last;
};
m/file/ && do { local $/; undef $/;
open (F, "<", $loc) or
$self->throw(
-class=>'Bio::Root::FileOpenException',
-text=>"Error opening tempfile\" $loc\" for reading",
-value=>$loc
);
@data = split( /\n|\r/, <F>);
close(F);
last;
};
do {
1; };
}
$self->throw(-class=>'Bio::Root::BadParameter',
-text=>'No data found in repsonse',
-value=>%args) unless (@data);
shift @data; @cols = split( /\t/, shift @data);
($idkey) = grep /SE.id/, @cols unless ($idkey) = grep /Accession/, @cols;
$self->throw(-class=>"Bio::ResponseProblem::Exception",
-text=>"Trouble with column headers in LANL response",
-value=>\@cols) unless $idkey;
foreach (@data) {
chop;
@rec{@cols} = split /\t/;
push @flines, ">$rec{$idkey}\n".$rec{'Sequence'}."\n";
}
for ($type) {
m/string/ && do { ${$loc} = join("", @flines); last;
};
m/file/ && do { open(F, ">", $loc) or $self->throw(-class=>'Bio::Root::FileOpenException', -text=>'Error opening tempfile \"$loc\" for writing', -value=>$loc); print F join("", @flines);
close(F);
last;
};
do {
1; };
}
return; } |
sub get_seq_stream
{ my ($self, %qualifiers) = @_;
my ($rformat, $ioformat) = $self->request_format();
my ($key) = grep /format$/, keys %qualifiers;
$qualifiers{'-format'} = ($key ? $qualifiers{$key} : $rformat);
($rformat, $ioformat) = $self->request_format($qualifiers{'format'});
my $request = $self->get_request(%qualifiers);
$request->proxy_authorization_basic($self->authentication)
if ( $self->authentication);
$self->debug("request is ". $request->as_string(). "\n");
$self->retrieval_type('io_string') if $self->retrieval_type =~ /pipeline/ && $^O =~ /^MSWin/;
if ($self->retrieval_type =~ /pipeline/) {
my ($result,$stream) = $self->_open_pipe();
if (defined $result) {
$DB::fork_TTY = File::Spec->devnull; if (!$result) { $self->_stream_request($request,$stream);
POSIX::_exit(0); }
else {
return Bio::SeqIO->new('-verbose' => $self->verbose,
'-format' => $ioformat,
'-fh' => $stream);
}
}
else {
$self->retrieval_type('io_string');
}
}
if ($self->retrieval_type =~ /temp/i) {
my $dir = $self->io->tempdir( CLEANUP => 1);
my ( $fh, $tmpfile) = $self->io()->tempfile( DIR => $dir );
close $fh;
my $resp = $self->_request($request, $tmpfile);
if( ! -e $tmpfile || -z $tmpfile || ! $resp->is_success() ) {
$self->throw("WebDBSeqI Error - check query sequences!\n");
}
$self->postprocess_data('type' => 'file','location' => $tmpfile);
($rformat,$ioformat) = $self->request_format();
if( $self->verbose > 0 ) {
open(my $ERR, "<", $tmpfile);
while(<$ERR>) { $self->debug($_);}
}
return Bio::SeqIO->new('-verbose' => $self->verbose,
'-format' => $ioformat,
'-file' => $tmpfile);
}
if ($self->retrieval_type =~ /io_string/i ) {
my $resp = $self->_request($request);
my $content = $resp->content_ref;
$self->debug( "content is $$content\n");
if (!$resp->is_success() || length($$content) == 0) {
$self->throw("WebDBSeqI Error - check query sequences!\n");
}
($rformat,$ioformat) = $self->request_format();
$self->postprocess_data('type'=> 'string',
'location' => $content);
$self->debug( "str is $$content\n");
return Bio::SeqIO->new('-verbose' => $self->verbose,
'-format' => $ioformat,
'-fh' => new IO::String($$content));
}
$self->throw("retrieval type " .
$self->retrieval_type .
" unsupported\n");} |
sub get_Stream_by_acc
{ my ($self, $ids ) = @_;
return $self->get_seq_stream('-uids' => [$ids], '-mode' => 'acc');} |
sub get_Stream_by_query
{ my ($self, $query ) = @_;
my $stream = $self->get_seq_stream('-query' => $query, '-mode'=>'query');
return new Bio::DB::HIV::HIVAnnotProcessor( -hiv_query=>$query, -source_stream=>$stream );} |
sub _request
{ my ($self, $request,$tmpfile) = @_;
my ($resp);
if( defined $tmpfile && $tmpfile ne '' ) {
$resp = $self->ua->request($request, $tmpfile);
} else {
$resp = $self->ua->request($request);
}
if( $resp->is_error ) {
$self->throw("WebDBSeqI Request Error:\n".$resp->as_string);
}
return $resp;} |
sub lanl_base
{ my $self = shift;
return $self->{'lanl_base'} = shift if @_;
return $self->{'lanl_base'};} |
sub map_db
{ my $self = shift;
return $self->{'map_db'} = shift if @_;
return $self->{'map_db'};} |
sub make_search_if
{ my $self = shift;
return $self->{'make_search_if'} = shift if @_;
return $self->{'make_search_if'};} |
sub search_
{ my $self = shift;
return $self->{'search_'} = shift if @_;
return $self->{'search_'};} |
sub _map_db_uri
{ my $self = shift;
return $self->url_base_address."/".$self->map_db;} |
sub _make_search_if_uri
{ my $self = shift;
return $self->url_base_address."/".$self->make_search_if;} |
sub _search_uri
{ my $self = shift;
return $self->url_base_address."/".$self->search_;} |
sub _session_id
{ my $self = shift;
return $self->{'_session_id'} = shift if @_;
return $self->{'_session_id'};} |
sub _response
{ my $self = shift;
return $self->{'_response'} = shift if @_;
return $self->{'_response'};} |
sub _sorry
{ my $self = shift;
my $parm = shift;
$self->throw(-class=>"Bio::HIVSorry::Exception",
-text=>"Sorry, option/parameter\" $parm\" not (yet) supported. See manpage to complain.",
-value=>$parm);
return;
}
1;} |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via
the web:
http://bugzilla.open-bio.org/
| AUTHOR - Mark A. Jensen | Top |
The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _
Title : _sorry
Usage : $hiv->_sorry
Function: Throws an exception for unsupported option or parameter
Example :
Returns :
Args : scalar string