| Summary | Included libraries | Package variables | Synopsis | Description | General documentation | Methods |
@params = ('database' => 'swissprot','outfile' => 'blast1.out',
'_READMETHOD' => 'Blast');
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
Blast a sequence against a database:$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta' ); $input = $str->next_seq(); $input2 = $str->next_seq(); $blast_report = $factory->blastall($input);Run an iterated Blast (psiblast) of a sequence against a database:
$factory->j(3); # 'j' is blast parameter for # of iterations
$factory->outfile('psiblast1.out');
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
$blast_report = $factory->blastpgp($input);
Use blast to align 2 sequences against each other: $factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out');
$factory->bl2seq($input, $input2);
Various additional options and input formats are available. See the @params = ('program' => 'blastn', 'database' => 'ecoli.nt');
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
Any parameters not explicitly set will remain as the defaults of the$expectvalue = 0.01; $factory->e($expectvalue);Note that for improved script readibility one can modify the name of
> blastpgp - .Once the factory has been created and the appropriate parameters set,
$inputfilename = 't/testquery.fa'; $blast_report = $factory->blastall($inputfilename);In addition, sequence input may be in the form of either a Bio::Seq
$input = Bio::Seq->new(-id=>"test query",-seq=>"ACTACCCTTTAAATCAGTGGGGG"); $blast_report = $factory->blastall($input);For blastall and non-psiblast blastpgp runs, report object is either a
-signif => $self->e() || 1e-5, # where $self->e(), if set, is the BLAST cutoff value -parse => 1, -stats => 1, -check_all_hits => 1,If it is desired to parse the resulting report with Blast.pm with
$str = Bio::AlignIO->new(-file=> "cysprot.msf", '-format' => 'msf' );
$aln = $str->next_aln();
$len = $aln->length_aln();
$mask = '1' x $len; # simple case where PSSM's to be used at all residues
$report = $factory->blastpgp("cysprot1.fa", $aln, $mask);
For bl2seq execution, StandAloneBlast.pm can be combined with #Get 2 sequences
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', );
my $seq3 = $str->next_seq();
my $seq4 = $str->next_seq();
# Run bl2seq on them
$factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out');
my $bl2seq_report = $factory->bl2seq($seq3, $seq4);
# Use AlignIO.pm to create a SimpleAlign object from the bl2seq report
$str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq');
$aln = $str->next_aln();
For more examples of syntax and use of Blast.pm, the user is| BEGIN | Code | |
| new | No description | Code |
| AUTOLOAD | No description | Code |
| exists_blast | Description | Code |
| blastall | Description | Code |
| blastpgp | Description | Code |
| bl2seq | Description | Code |
| _generic_local_blast | Description | Code |
| _runblast | Description | Code |
| _setinput | Description | Code |
| _setparams | Description | Code |
| exists_blast | code | next | Top |
Title : exists_blast
Usage : $blastfound = Bio::Tools::Run::StandAloneBlast->exists_blast()
Function: Determine whether Blast program can be found on current host.
Cf. the DESCRIPTION section of this POD for how to make sure
for your BLAST installation to be found. This method checks for
existence of the blastall executable either in BLASTDIR or in
the path.
Side effects: if BLASTDATADIR is not set, checks whether data is a
subdirectory of the directory where blastall is found, and if so,
sets DATADIR accordingly.
Returns : 1 if Blast program found at expected location, 0 otherwise.
Args : none |
| blastall | code | prev | next | Top |
Title : blastall
Usage : $blast_report = $factory->blastall('t/testquery.fa');
or
$input = Bio::Seq->new(-id=>"test query",
-seq=>"ACTACCCTTTAAATCAGTGGGGG");
$blast_report = $factory->blastall($input);
or
$seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects
$blast_report = $factory->blastall(\@seq_array);
Returns : Reference to a Blast object or BPlite object
containing the blast report.
Args : Name of a file or Bio::Seq object or an array of
Bio::Seq object containing the query sequence(s).
Throws an exception if argument is not either a string
(eg a filename) or a reference to a Bio::Seq object
(or to an array of Seq objects). If argument is string,
throws exception if file corresponding to string name can
not be found. |
| blastpgp | code | prev | next | Top |
Title : blastpgp
Usage : $blast_report = $factory-> blastpgp('t/testquery.fa');
or
$input = Bio::Seq->new(-id=>"test query",
-seq=>"ACTADDEEQQPPTCADEEQQQVVGG");
$blast_report = $factory->blastpgp ($input);
or
$seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects
$blast_report = $factory-> blastpgp(\@seq_array);
Returns : Reference to a Blast object or BPlite object containing
the blast report.
Args : Name of a file or Bio::Seq object. In psiblast jumpstart
mode two additional arguments are required: a SimpleAlign
object one of whose elements is the query and a "mask" to
determine how BLAST should select scoring matrices see
DESCRIPTION above for more details.
Throws an exception if argument is not either a string
(eg a filename) or a reference to a Bio::Seq object
(or to an array of Seq objects). If argument is string,
throws exception if file corresponding to string name can
not be found.
Returns : Reference to either a BPlite.pm, Blast.pm or BPpsilite.pm
object containing the blast report. |
| bl2seq | code | prev | next | Top |
Title : bl2seq
Usage : $factory-> blastpgp('t/seq1.fa', 't/seq2.fa');
or
$input1 = Bio::Seq->new(-id=>"test query1",
-seq=>"ACTADDEEQQPPTCADEEQQQVVGG");
$input2 = Bio::Seq->new(-id=>"test query2",
-seq=>"ACTADDEMMMMMMMDEEQQQVVGG");
$blast_report = $factory->bl2seq ($input1, $input2);
Returns : Reference to a BPbl2seq object containing the blast report.
Args : Names of 2 files or 2 Bio::Seq objects containing the
sequences to be aligned by bl2seq.
Throws an exception if argument is not either a pair of
strings (eg filenames) or references to Bio::Seq objects.
If arguments are strings, throws exception if files
corresponding to string names can not be found. |
| _generic_local_blast | code | prev | next | Top |
Title : _generic_local_blast Usage : internal function not called directly Returns : Blast or BPlite object Args : Reference to calling object and name of BLAST executable |
| _runblast | code | prev | next | Top |
Title : _runblast
Usage : Internal function, not to be called directly
Function: makes actual system call to Blast program
Example :
Returns : Report object in the appropriate format (BPlite,
BPpsilite, Blast, or BPbl2seq)
Args : Reference to calling object, name of BLAST executable,
and parameter string for executable |
| _setinput | code | prev | next | Top |
Title : _setinput Usage : Internal function, not to be called directly Function: Create input file(s) for Blast executable Example : Returns : name of file containing Blast data input Args : Seq object reference or input file name |
| _setparams | code | prev | next | Top |
Title : _setparams Usage : Internal function, not to be called directly Function: Create parameter inputs for Blast program Example : Returns : parameter string to be passed to Blast Args : Reference to calling object and name of BLAST executable |
| BEGIN | Top |
@BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q
D a O J M W z K L Y S T l U y Z);
@BLASTPGP_PARAMS = qw(d i A f e m o y P F G E X N g S H a I h c
j J Z O M v b C R W z K L Y p k T Q B l U);
@BL2SEQ_PARAMS = qw(i j p g o d a G E X W M q r F e S T ;
# Non BLAST parameters start with underscore to differentiate them
# from BLAST parameters
@OTHER_PARAMS = qw(_READMETHOD);
# _READMETHOD = 'BPlite' (default) or 'Blast'
# my @other_switches = qw(QUIET);
# Authorize attribute fields
foreach my $attr (@BLASTALL_PARAMS, @BLASTPGP_PARAMS,
@BL2SEQ_PARAMS, @OTHER_PARAMS)
{ $OK_FIELD{$attr}++; }
# You will need to enable Blast to find the Blast program. This can be done
# in (at least) two different ways:
# 1. define an environmental variable blastDIR:
# export BLASTDIR=/home/peter/blast or
# 2. include a definition of an environmental variable BLASTDIR in every script that will
# use StandAloneBlast.pm.
# BEGIN {$ENV{BLASTDIR} = '/home/peter/blast/'; }
$BLASTDIR = $ENV{'BLASTDIR'} || '';
# If local BLAST databases are not stored in the standard
# /data directory, the variable BLASTDATADIR will need to be set explicitly
$DATADIR = $ENV{'BLASTDATADIR'} || $ENV{'BLASTDB'} || '';}| new | description | prev | next | Top |
my ($caller, @args) = @_; # chained new}
my $self = $caller->SUPER::new(@args); # to facilitiate tempfile cleanup
$self->_initialize_io(); unless (&Bio::Tools::Run::StandAloneBlast::exists_blast()) { $self->debug( "Blast program not found or not executable.\n Blast can be obtained from ftp://ftp.ncbi.nlm.nih.gov/blast/server/current_release/\n"); } # to facilitiate tempfile cleanup
$self->_initialize_io(); my ($fh,$tempfile) = $self->tempfile(); $self->outfile($tempfile); $self->_READMETHOD('BPlite'); while (@args) { my $attr = shift @args; my $value = shift @args; next if( $attr eq '-verbose'); $self->$attr($value); } return $self;
| AUTOLOAD | description | prev | next | Top |
my $self = shift; my $attr = $AUTOLOAD; $attr =~ s/.*:://; my $attr_letter = substr($attr, 0, 1) ; # actual key is first letter of $attr unless first attribute}
# letter is underscore (as in _READMETHOD), the $attr is a BLAST
# parameter and should be truncated to its first letter only
$attr = ($attr_letter eq '_') ? $attr : $attr_letter; $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; # $self->throw("Unallowed parameter: $attr !") unless $ok_field{$attr_letter};
$self->{$attr_letter} = shift if @_; return $self->{ $attr_letter};
| exists_blast | description | prev | next | Top |
my ($exe) = shift; # can call as a class method or as a function}
if( defined $exe && $exe =~ /Bio::Tools/i ) { $exe = shift; } $exe ||= 'blastall'; if( $^O =~ /mswin/i) { $exe .= '.exe'; } my $f; if((! $DATADIR) && (-d Bio::Root::IO->catfile($BLASTDIR, "data"))) { $DATADIR = Bio::Root::IO->catfile($BLASTDIR, "data"); } if( ($f = Bio::Root::IO->exists_exe($exe)) || ($f = Bio::Root::IO->exists_exe(Bio::Root::IO->catfile($BLASTDIR, $exe))) ) { $PROGRAMS{$exe} = $f if( -e $f ); return 1; } #debug("exe is $exe blastall is $f\n");
return 0;
| blastall | description | prev | next | Top |
my ($self,$input1) = @_; $self->_io_cleanup(); my $executable = 'blastall'; my $input2; # Create input file pointer}
my $infilename1 = $self->_setinput($executable, $input1); if (! $infilename1) {$self->throw(" $input1 ($infilename1) not Bio::Seq object or array of Bio::Seq objects or file name!");} $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastall)
my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2);
| blastpgp | description | prev | next | Top |
my $self = shift; my $executable = 'blastpgp'; my $input1 = shift; my $input2 = shift; my $mask = shift; # used by blastpgp's -B option to specify which residues are position aligned}
my ($infilename1, $infilename2 ) = $self->_setinput($executable, $input1, $input2, $mask); if (!$infilename1) {$self->throw(" $input1 not Bio::Seq object or array of Bio::Seq objects or file name!");} $self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastpgp)
if ($input2) { unless ($infilename2) {$self->throw("$input2 not SimpleAlign Object in pre-aligned psiblast\n");} $self->B($infilename2); # set file name of partial alignment to inputfilename2 (-B param of blastpgp)
} my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2);
| bl2seq | description | prev | next | Top |
my $self = shift; my $executable = 'bl2seq'; my $input1 = shift; my $input2 = shift; # Create input file pointer}
my ($infilename1, $infilename2 ) = $self->_setinput($executable, $input1, $input2); if (!$infilename1) {$self->throw(" $input1 not Seq Object or file name!");} if (!$infilename2) {$self->throw("$input2 not Seq Object or file name!");} $self->i($infilename1); # set file name of first sequence to be aligned to inputfilename1 (-i param of bl2seq)
$self->j($infilename2); # set file name of first sequence to be aligned to inputfilename2 (-j param of bl2seq)
my $blast_report = &_generic_local_blast($self, $executable);
| _generic_local_blast | description | prev | next | Top |
my $self = shift;
my $executable = shift;
# Create parameter string to pass to Blast program
my $param_string = $self->_setparams($executable);
# run Blast
my $blast_report = &_runblast($self, $executable, $param_string);}| _runblast | description | prev | next | Top |
my ($self,$executable,$param_string) = @_; my $blast_obj; if( ! defined $PROGRAMS{$executable} ) { unless (&Bio::Tools::Run::StandAloneBlast::exists_blast($executable)) { $self->warn("cannot find path to $executable"); return undef; } } my $commandstring = $PROGRAMS{$executable} . $param_string; # next line for debugging}
$self->debug( "$commandstring\n "); my $status = system($commandstring); $self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ; my $outfile = $self->o() ; # get outputfilename
my $signif = $self->e() || 1e-5 ; # set significance cutoff to set expectation value or default value
# (may want to make this value vary for different executables)
# If running bl2seq or psiblast (blastpgp with multiple iterations),
# the specific parsers for these programs must be used (ie BPbl2seq or
# BPpsilite). Otherwise either the Blast parser or the BPlite
# parsers can be selected.
if ($executable eq 'bl2seq') { $blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile); } elsif ($executable eq 'blastpgp' && defined $self->j() && $self->j() > 1) { $blast_obj = Bio::Tools::BPpsilite->new(-file => $outfile); } elsif ($self->_READMETHOD eq 'Blast') { $blast_obj = Bio::SearchIO->new(-file=>$outfile, -format => 'blast' ) ; } elsif ($self->_READMETHOD eq 'BPlite') { $blast_obj = Bio::Tools::BPlite->new(-file=>$outfile); } return $blast_obj;
| _setinput | description | prev | next | Top |
my ($self, $executable, $input1, $input2) = @_; my ($seq, $temp, $infilename1, $infilename2,$fh ) ; # If $input1 is not a reference it better be the name of a file with}
# the sequence/ alignment data...
$self->_io_cleanup(); SWITCH: { unless (ref $input1) { $infilename1 = (-e $input1) ? $input1 : 0 ; last SWITCH; } # $input may be an array of BioSeq objects...
if (ref($input1) =~ /ARRAY/i ) { ($fh,$infilename1) = $self->tempfile(); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); foreach $seq (@$input1) { unless ($seq->isa("Bio::PrimarySeqI")) {return 0;} $temp->write_seq($seq); } close $fh; last SWITCH; } # $input may be a single BioSeq object...
elsif ($input1->isa("Bio::PrimarySeqI")) { ($fh,$infilename1) = $self->tempfile(); # just in case $input1 is taken from an alignment and has spaces (ie
# deletions) indicated within it, we have to remove them - otherwise
# the BLAST programs will be unhappy
my $seq_id = $input1->display_id(); my $seq_string = $input1->seq(); $seq_string =~ s/\W+//g; # get rid of spaces in sequence
$seq = Bio::Seq->new(-seq=> $seq_string, -display_id =>$seq_id ); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); $temp->write_seq($seq); close $fh; # $temp->write_seq($input1);
last SWITCH; } $infilename1 = 0; # Set error flag if you get here
} # End SWITCH
unless ($input2) { return $infilename1; } SWITCH2: { unless (ref $input2) { $infilename2 = (-e $input2) ? $input2 : 0 ; last SWITCH2; } if ($input2->isa("Bio::PrimarySeqI") && $executable eq 'bl2seq' ) { ($fh,$infilename2) = $self->tempfile(); $temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta'); $temp->write_seq($input2); close $fh; last SWITCH2; } # Option for using psiblast's pre-alignment "jumpstart" feature
elsif ($input2->isa("Bio::SimpleAlign") && $executable eq 'blastpgp' ) { # a bit of a lie since it won't be a fasta file
($fh,$infilename2) = $self->tempfile(); # first we retrieve the "mask" that determines which residues should
# by scored according to their position and which should be scored
# using the non-position-specific matrices
my @mask = split("", shift ); # get mask
# then we have to convert all the residues in every sequence to upper
# case at the positions that we want psiblast to use position specific
# scoring
foreach $seq ( $input2->eachSeq() ) { my @seqstringlist = split("",$seq->seq()); for (my $i = 0; $i < scalar(@mask); $i++) { unless ( $seqstringlist[$i] =~ /[a-zA-Z]/ ) {next} $seqstringlist[$i] = $mask[$i] ? uc $seqstringlist[$i]: lc $seqstringlist[$i] ; } my $newseqstring = join("", @seqstringlist); $seq->seq($newseqstring); } # Now we need to write out the alignment to a file in the "psi format" which psiblast is expecting
$temp = Bio::AlignIO->new(-fh=> $fh, '-format' => 'psi'); $temp->write_aln($input2); close $fh; last SWITCH2; } $infilename2 = 0; # Set error flag if you get here
} # End SWITCH2
return ($infilename1, $infilename2);
| _setparams | description | prev | next | Top |
my ($self,$executable) = @_; my ($attr, $value, @execparams); if ($executable eq 'blastall') {@execparams = @BLASTALL_PARAMS; } if ($executable eq 'blastpgp') {@execparams = @BLASTPGP_PARAMS; } if ($executable eq 'bl2seq') {@execparams = @BL2SEQ_PARAMS; } my $param_string = ""; for $attr ( @execparams ) { $value = $self->$attr(); next unless (defined $value); # Need to prepend datadirectory to database name}
if ($attr eq 'd' && ($executable ne 'bl2seq')) { # This is added so that you can specify a DB with a full path
if (! (-e $value.".nin" || -e $value.".pin")){ $value = File::Spec->catdir($DATADIR,$value); } } # put params in format expected by Blast
$attr = '-'. $attr ; $param_string .= " $attr $value "; } # if ($self->quiet()) { $param_string .= ' >/dev/null';}
return $param_string;
| DEVELOPERS NOTES | Top |
| FEEDBACK | Top |
| Mailing Lists | Top |
bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists
| Reporting Bugs | Top |
bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/
| AUTHOR - Peter Schattner | Top |
| APPENDIX | Top |
| BLAST parameters | Top |
| Blastall | Top |
-p Program Name [String]
Input should be one of "blastp", "blastn", "blastx",
"tblastn", or "tblastx".
-d Database [String] default = nr
The database specified must first be formatted with formatdb.
Multiple database names (bracketed by quotations) will be accepted.
An example would be -d "nr est"
-i Query File [File In] Set by StandAloneBlast.pm from script.
default = stdin. The query should be in FASTA format. If multiple FASTA entries are in the input
file, all queries will be searched.
-e Expectation value (E) [Real] default = 10.0
-o BLAST report Output File [File Out] Optional,
default = ./blastreport.out ; set by StandAloneBlast.pm
-S Query strands to search against database (for blast[nx], and tblastx). 3 is both, 1 is top, 2 is bottom [Integer]
default = 3| Blastpgp (including Psiblast) | Top |
-j is the maximum number of rounds (default 1; i.e., regular BLAST) -h is the e-value threshold for including sequences in the score matrix model (default 0.001) -c is the "constant" used in the pseudocount formula specified in the paper (default 10) -B Multiple alignment file for PSI-BLAST "jump start mode" Optional -Q Output File for PSI-BLAST Matrix in ASCII [File Out] Optional
| Bl2seq | Top |
-i First sequence [File In]
-j Second sequence [File In]
-p Program name: blastp, blastn, blastx. For blastx 1st argument should be nucleotide [String]
default = blastp
-o alignment output file [File Out] default = stdout
-e Expectation value (E) [Real] default = 10.0
-S Query strands to search against database (blastn only). 3 is both, 1 is top, 2 is bottom [Integer]
default = 3| Methods | Top |