Bio::DB::Biblio
pdf
Summary
Bio::DB::Biblio::pdf - Fetch PDF for a PubMed ID
Package variables
Privates (from "my" definitions)
%visit = ()
Included modules
Data::Dumper
WWW::Mechanize
base qw ( Bio::Biblio Bio::DB::BiblioI )
Synopsis
Do not use this object directly, it is recommended to access it and use
it through the
Bio::Biblio module:
use Bio::Biblio;
my $biblio = new Bio::Biblio (-access => 'pdf');
Description
This object contains the real implementation of a Bibliographic Query
Service as defined in
Bio::DB::BiblioI.
Bio::DB::BiblioI is not implemented as documented in the interface,
particularly the find() method, which is not compatible with PubMed's
query language.
Methods
Methods description
Usage : my $obj = new Bio::Biblio (-access => 'pdf' ...); (_initialize is internally called from this constructor) Returns : 1 on success Args : none
This is an actual new() method (except for the real object creation and its blessing which is done in the parent class Bio::Root::Root in method _create_object). Note that this method is called always as an object method (never as a class method) - and that the object who calls this method may already be partly initiated (from Bio::Biblio::new method); so if you need to do some tricks with the 'class invocation' you need to change Bio::Biblio::new method, not this one. |
Title : get_next Usage : $xml = $biblio->get_next(); Function: return next record as xml Returns : an xml string Args : none |
Title : find Usage : $biblio = $biblio->find(1234); Function: perform a PubMed query by PubMed ID Returns : a reference to the object on which the method was called Args : a PubMed ID |
Title : exists Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Title : destroy Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Title : get_vocabulary_names Usage : do not use Function: no-op. this is here only for interface compatibility Returns : empty arrayref Args : none |
Title : contains Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Title : get_entry_description Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Title : get_all_values Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Title : get_all_entries Usage : do not use Function: no-op. this is here only for interface compatibility Returns : undef Args : none |
Usage : $obj->depth($newval) Function: track link recursion depth Example : Returns : value of depth (a scalar) Args : on set, new value (a scalar or undef, optional) |
Usage : $obj->max_depth($newval) Function: how far should link recursion go? Example : Returns : value of max_depth (a scalar) Args : on set, new value (a scalar or undef, optional) |
Usage : $obj->ua($newval) Function: holds an LWP::UserAgent instance Example : Returns : value of ua (a scalar) Args : on set, new value (a scalar or undef, optional) |
Usage : $obj->pdf($newval) Function: holds pdf data Example : Returns : value of pdf (a scalar) Args : on set, new value (a scalar or undef, optional) |
Methods code
sub _initialize
{ my ($self, @args) = @_;
my %param = @args;
@param { map { lc $_ } keys %param } = values %param;
$self->max_depth(3);
$self->depth(0);
$self->ua( WWW::Mechanize->new());
$self->ua->agent_alias('Linux Mozilla');
my $new_key;
foreach my $key (keys %param) {
($new_key = $key) =~ s/^-/_/;
$self->{ $new_key } = $param { $key };
}
return 1;} |
sub get_next
{ my $self = shift;
return $self->pdf();
return; } |
sub find
{ my ($self,$id) = @_;
$self->{pdf} = undef;
$self->_process_pubmed_html($id);} |
sub get_vocabulary_names
{ return []; } |
sub get_entry_description
{ return; } |
sub get_all_values
{ return; } |
sub get_all_entries
{ return; } |
sub depth
{ my($self,$val) = @_;
$self->{'depth'} = $val if defined($val);
return $self->{'depth'};} |
sub max_depth
{ my($self,$val) = @_;
$self->{'max_depth'} = $val if defined($val);
return $self->{'max_depth'};} |
sub ua
{ my($self,$val) = @_;
$self->{'ua'} = $val if defined($val);
return $self->{'ua'};} |
sub _process_pubmed_html
{ my ($self,$id) = @_;
$self->ua->get( ABSTRACT_BASE . $id );
my $page = $self->ua->content();
$page =~ m|<!---- Pager -- \(page header\) -- end ------>.+?<SPAN><a href="(.+?)" onClick="window.open|s;
if( ! defined($1) ) {
return;
}
$self->ua->follow_link( url => $1 );
my $pdf_url = $self->guess_pdf_url($self->ua->uri);
$self->throw( "didn't recognize pattern in '".$self->ua->uri."', please patch module" ) unless $pdf_url;
$self->ua->get( $pdf_url );
my $content = $self->ua->content();
$self->pdf( $content ); } |
sub guess_pdf_url
{ my($self,$url) = @_;
if( $url =~ m!^(.+?)/cgi/content/full/(\d+)/(\d+)/(\d+)/?$! ) {
return qq($1/cgi/reprint/$2/$3/$4.pdf);
}
elsif( $url =~ m!^(.+?cgi-taf/DynaPage.taf.+?)/journal/(.+?)/abs/(.+?\.html)! ) {
return qq($1/journal/$2/full/$3\&filetype=pdf);
}
elsif( $url =~ m!^(.+?science\?_ob=)ArticleURL(.+?)$! ) {
my $link = $self->ua->find_link( text_regex => qr/PDF \(.+?\)/s );
return unless $link;
return $link->url_abs();
}
elsif( $url =~ m!^(.+?genomebiology.com)/(\d+)/(\d+)/(\d+)/(.+?)/?$! ) {
my $file = lc(sprintf("gb-%d-%d-%d-%s.pdf",$2,$3,$4,$5));
return qq($1/content/pdf/$file);
}
elsif( $url =~ m!^(.+?/cgi-bin)/abstract/(\d+?)/ABSTRACT$! ) {
$self->ua->get( qq($1/fulltext/$2/PDFSTART) );
my $link = $self->ua->find_link( url_regex => qr/fulltext/ );
return unless $link;
return $link->url_abs();
}
elsif( $url =~ m!^(.+?oupjournals.org/cgi)/reprint/(.+?)$! ) {
return qq($1/reprint/$2.pdf);
}
elsif( $url =~ m!^(.+?oupjournals.org/cgi)/content/full/(.+?)$! ) {
return qq($1/reprint/$2.pdf);
}
elsif( $url =~ m!^http://[^.]+?\.plos! ) {
my $link = $self->ua->find_link( text_regex => qr/^Screen/s );
return unless $link;
return $link->url_abs();
}
elsif( $url =~ m!^(.+?biomedcentral.+?)/(\d+\-\d+)/(\d+)/(\d+)/?$! ) {
my $file = lc(sprintf("%s-%d-%d.pdf",$2,$3,$4));
return qq($1/content/pdf/$file);
}
warn $url;
return; } |
sub pdf
{ my($self,$val) = @_;
$self->{'pdf'} = $val if defined($val);
return $self->{'pdf'};} |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via the
web:
http://bugzilla.open-bio.org/
Allen Day <allenday@ucla.edu>
Copyright (c) 2004 Allen Day, University of California, Los Angeles.
This module is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
This software is provided "as is" without warranty of any kind.
*
Of course, you'll need access to the sites hosting the PDFs to download
them.
If you're having problems retrieving PDF from a site you have access to,
you might try adjusting the max_depth() attribute. It is default set to 3,
and affects how many links deep will be recursively followed in page
fetches to try to find your PDF.
The main documentation details are to be found in
Bio::DB::BiblioI.
Here is the rest of the object methods. Interface methods first,
followed by internal methods.
| Internal methods unrelated to Bio::DB::BiblioI | Top |