| Summary | Included libraries | Package variables | Synopsis | Description | General documentation | Methods |
| WebCvs |
# Build a clustalw alignment factory
@params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
# Pass the factory a list of sequences to be aligned. $inputfilename = 't/data/cysprot.fa'; $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object. # or $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects $aln = $factory->align($seq_array_ref); # Or one can pass the factory a pair of (sub)alignments #to be aligned against each other, e.g.: $aln = $factory->profile_align($aln1,$aln2); # where $aln1 and $aln2 are Bio::SimpleAlign objects. # Or one can pass the factory an alignment and one or more unaligned # sequences to be added to the alignment. For example: $aln = $factory->profile_align($aln1,$seq); # $seq is a Bio::Seq object. # Get a tree of the sequences $tree = $factory->tree(\@seq_array); # Get both an alignment and a tree ($aln, $tree) = $factory->run(\@seq_array); # Do a footprinting analysis on the supplied sequences, getting back the # most conserved sub-alignments my @results = $factory->footprint(\@seq_array); foreach my $result (@results) { print $result->consensus_string, "\n"; } # There are various additional options and input formats available. # See the DESCRIPTION section that follows for additional details.
1. Make sure the clustalw executable is in your path so thatIf you are running an application on a webserver make sure the
which clustalw
returns a clustalw executable on your system.
2. Define an environmental variable CLUSTALDIR which is a directory which contains the 'clustalw' application: In bash: export CLUSTALDIR=/home/username/clustalw1.8 In csh/tcsh: setenv CLUSTALDIR /home/username/clustalw1.8 3. Include a definition of an environmental variable CLUSTALDIR in every script that will use this Clustalw wrapper module, e.g.: BEGIN { $ENV{CLUSTALDIR} = '/home/username/clustalw1.8/' } use Bio::Tools::Run::Alignment::Clustalw;
= ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
Any parameters not explicitly set will remain as the defaults of the$ktuple = 3;Once the factory has been created and the appropriate parameters set,
$factory->ktuple($ktuple);
$get_ktuple = $factory->ktuple();
$inputfilename = 't/data/cysprot.fa';Alternately one can create an array of Bio::Seq objects somehow
$aln = $factory->align($inputfilename);
$str = Bio::SeqIO->new(-file=> 't/data/cysprot.fa', -format => 'Fasta'); @seq_array =();and pass the factory a reference to that array
while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;}
$seq_array_ref = \@seq_array;In either case, align() returns a reference to a SimpleAlign object
$aln = $factory->align($seq_array_ref);
$str = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf');In either case, profile_align() returns a reference to a SimpleAlign
$aln = $str->next_aln();
$str1 = Bio::SeqIO->new(-file=> 't/data/cysprot1b.fa');
$seq = $str1->next_seq();
$aln = $factory->profile_align($aln,$seq);
$profile1 = 't/data/cysprot1a.msf';or
$profile2 = 't/data/cysprot1b.msf';
$aln = $factory->profile_align($profile1,$profile2);
$str1 = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf');In either case, profile_align() returns a reference to a SimpleAlign
$aln1 = $str1->next_aln();
$str2 = Bio::AlignIO->new(-file=> 't/data/cysprot1b.msf');
$aln2 = $str2->next_aln();
$aln = $factory->profile_align($aln1,$aln2);
| program_name | Description | Code |
| program_dir | Description | Code |
| new | No description | Code |
| version | Description | Code |
| run | Description | Code |
| align | Description | Code |
| profile_align | Description | Code |
| add_sequences | Description | Code |
| tree | Description | Code |
| footprint | Description | Code |
| _run | Description | Code |
| _get_tree | No description | Code |
| _setinput | Description | Code |
| _setparams | Description | Code |
| program_name | code | next | Top |
Title : program_name |
| program_dir | code | prev | next | Top |
Title : program_dir |
| version | code | prev | next | Top |
Title : version |
| run | code | prev | next | Top |
Title : run |
| align | code | prev | next | Top |
Title : align |
| profile_align | code | prev | next | Top |
Title : profile_alignThrows an exception if arguments are not either strings (eg filenames) or references to SimpleAlign objects. |
| add_sequences | code | prev | next | Top |
Title : add_sequencesThrows an exception if arguments are not either strings (eg filenames) or references to SimpleAlign objects. |
| tree | code | prev | next | Top |
Title : tree |
| footprint | code | prev | next | Top |
Title : footprint |
| _run | code | prev | next | Top |
Title : _run |
| _setinput() | code | prev | next | Top |
Title : _setinput |
| _setparams() | code | prev | next | Top |
Title : _setparams |
| program_name | description | prev | next | Top |
return $PROGRAM_NAME;}
| program_dir | description | prev | next | Top |
return $PROGRAM_DIR;}
| new | description | prev | next | Top |
my ($class,@args) = @_; my $self = $class->SUPER::new(@args); $self->_set_from_args(\@args, -methods => [@CLUSTALW_PARAMS, @CLUSTALW_SWITCHES, @OTHER_SWITCHES], -create => 1); return $self;}
| version | description | prev | next | Top |
my ($self) = @_; return undef unless $self->executable; my $prog = $self->executable; my $string = `$prog --` ; $string =~ /\(?([\d.]+)\)?/xms; return $1 || undef;}
| run | description | prev | next | Top |
my ($self,$input) = @_; my ($temp,$infilename, $seq); my ($attr, $value, $switch); $self->io->_io_cleanup(); # Create input file pointer}
$infilename = $self->_setinput($input); $self->throw("Bad input data (sequences need an id) or less than 2 sequences in $input!") unless $infilename; # Create parameter string to pass to clustalw program
my $param_string = $self->_setparams(); # run clustalw
return $self->_run('both', $infilename, $param_string);
| align | description | prev | next | Top |
my ($self,$input) = @_; $self->io->_io_cleanup(); # Create input file pointer}
my $infilename = $self->_setinput($input); $self->throw("Bad input data (sequences need an id ) or less than 2 sequences in $input !") unless $infilename; # Create parameter string to pass to clustalw program
my $param_string = $self->_setparams(); # run clustalw
my $aln = $self->_run('align', $infilename, $param_string);
| profile_align | description | prev | next | Top |
my ($self,$input1,$input2) = @_; $self->io->_io_cleanup(); # Create input file pointer}
my $infilename1 = $self->_setinput($input1, 1); my $infilename2 = $self->_setinput($input2, 2); if (!$infilename1 || !$infilename2) {$self->throw("Bad input data: $input1 or $input2 !");} unless ( -e $infilename1 and -e $infilename2) {$self->throw("Bad input file: $input1 or $input2 !");} # Create parameter string to pass to clustalw program
my $param_string = $self->_setparams(); # run clustalw
my $aln = $self->_run('profile-aln', $infilename1, $infilename2, $param_string);
| add_sequences | description | prev | next | Top |
my ($self,$input1,$input2) = @_;
my ($temp,$infilename1,$infilename2,$input,$seq);
$self->io->_io_cleanup();
# Create input file pointer
$infilename1 = $self->_setinput($input1,1);
$infilename2 = $self->_setinput($input2,2);
if (!$infilename1 || !$infilename2) {$self->throw("Bad input data: $input1 or $input2 !");}
unless ( -e $infilename1 and -e $infilename2) {$self->throw("Bad input file: $input1 or $input2 !");}
# Create parameter string to pass to clustalw program
my $param_string = $self->_setparams();
# run clustalw
my $aln = $self->_run('add_sequences', $infilename1,
$infilename2, $param_string);}| tree | description | prev | next | Top |
my ($self,$input) = @_; $self->io->_io_cleanup(); # Create input file pointer}
my $infilename = $self->_setinput($input); if (!$infilename) {$self->throw("Bad input data (sequences need an id ) or less than 2 sequences in $input !");} # Create parameter string to pass to clustalw program
my $param_string = $self->_setparams(); # run clustalw
my $tree = $self->_run('tree', $infilename, $param_string);
| footprint | description | prev | next | Top |
my ($self, $in, $slice_size, $deviate) = @_; my ($simplealn, $tree) = $self->run($in); # total tree length?}
my $total_length = $tree->total_branch_length; # tree length along sliding window, picking regions that significantly
# deviate from the average tree length
$slice_size ||= 5; $deviate ||= 33; my $threshold = $total_length - (($total_length / 100) * $deviate);
my $length = $simplealn->length; my $below = 0; my $found_minima = 0; my $minima = [$threshold, '']; my @results; for my $i (1..($length - $slice_size + 1)) { my $slice = $simplealn->slice($i, ($i + $slice_size - 1), 1); my $tree = $self->tree($slice); my $slice_length = $tree->total_branch_length; $slice_length <= $threshold ? ($below = 1) : ($below = 0); if ($below) { unless ($found_minima) { if ($slice_length < ${$minima}[0]) { $minima = [$slice_length, $slice]; } else { push(@results, ${$minima}[1]); $minima = [$threshold, '']; $found_minima = 1; } } } else { $found_minima = 0; } } return @results;
| _run | description | prev | next | Top |
my ($self, $command, $infile1, $infile2, $param_string) = @_; my ($instring, $tree); my $quiet = $self->quiet() || $self->verbose() < 0; if ($command =~ /align|both/) { if ($^O eq 'dec_osf') { $instring = $infile1; $command = ''; } else { $instring = " -infile=$infile1"; } $param_string .= " $infile2"; } if ($command =~ /profile/) { $instring = "-profile1=$infile1 -profile2=$infile2"; chmod 0777, $infile1, $infile2; $command = '-profile'; } if ($command =~ /add_sequences/) { $instring = "-profile1=$infile1 -profile2=$infile2"; chmod 0777, $infile1,$infile2; $command = '-sequences'; } if ($command =~ /tree/) { if( $^O eq 'dec_osf' ) { $instring = $infile1; $command = ''; } else { $instring = " $infile1"; } $param_string .= " $infile2"; $self->debug( "Program ".$self->executable."\n"); my $commandstring = $self->executable."$instring"."$param_string"; $commandstring .= ' 1>/dev/null' if $quiet; $self->debug( "clustal command = $commandstring"); my $status = system($commandstring); unless( $status == 0 ) { $self->warn( "Clustalw call ($commandstring) crashed: $?\n "); return undef; } return $self->_get_tree($infile1, $param_string); } my $output = $self->output || 'gcg'; $self->debug( "Program ".$self->executable."\n"); my $commandstring = $self->executable." $command"." $instring"." -output=$output". " $param_string"; $self->debug( "clustal command = $commandstring"); open(my $pipe, "$commandstring |") || $self->throw("ClustalW call ($commandstring) failed to start: $? | $!"); my $score; while (<$pipe>) { print unless $quiet; # Kevin Brown suggested the following regex, though it matches multiple}
# times: we pick up the last one
$score = $1 if ($_ =~ /Score:(\d+)/); # This one is printed at the end and seems the most appropriate to pick
# up; we include the above regex incase 'Alignment Score' isn't given
$score = $1 if ($_ =~ /Alignment Score (-?\d+)/); } close($pipe) || ($self->throw("ClustalW call ($commandstring) crashed: $?")); my $outfile = $self->outfile(); # retrieve alignment (Note: MSF format for AlignIO = GCG format of clustalw)
my $format = $output =~ /gcg/i ? 'msf' : $output; if ($format =~ /clustal/i) { $format = 'clustalw'; # force clustalw incase 'clustal' is requested
} my $in = Bio::AlignIO->new(-file => $outfile, -format=> $format); my $aln = $in->next_aln(); $in->close; $aln->score($score); if ($command eq 'both') { $tree = $self->_get_tree($infile1, $param_string); } # Clean up the temporary files created along the way...
# Replace file suffix with dnd to find name of dendrogram file(s) to delete
unless ( $self->save_tempfiles ) { foreach my $f ($infile1, $infile2) { $f =~ s/\.[^\.]*$// ; unlink $f .'.dnd' if ($f ne ''); } } if ($command eq 'both') { return ($aln, $tree); } return $aln;
| _get_tree | description | prev | next | Top |
my ($self, $treefile, $param_string) = @_; $treefile =~ s/\.[^\.]*$// ; if ($param_string =~ /-bootstrap/) { $treefile .= '.phb'; } elsif ($param_string =~ /-tree/) { $treefile .= '.ph'; } else { $treefile .= '.dnd'; } my $in = Bio::TreeIO->new('-file' => $treefile, '-format'=> 'newick'); my $tree = $in->next_tree; unless ( $self->save_tempfiles ) { foreach my $f ( $treefile ) { unlink $f if( $f ne '' ); } } return $tree;}
| _setinput | description | prev | next | Top |
my ($self, $input, $suffix) = @_; my ($infilename, $seq, $temp, $tfh); # suffix is used to distinguish alignment files If $input is not a}
# reference it better be the name of a file with the sequence/
# alignment data...
unless (ref $input) { # check that file exists or throw
$infilename = $input; return unless -e $input; return $infilename; } # $input may be an array of BioSeq objects...
if (ref($input) eq "ARRAY") { # Open temporary file for both reading & writing of BioSeq array
($tfh,$infilename) = $self->io->tempfile(-dir=>$self->tempdir); $temp = Bio::SeqIO->new('-fh'=>$tfh, '-format' =>'Fasta'); # Need at least 2 seqs for alignment
return unless (scalar(@$input) > 1); foreach $seq (@$input) { return unless (defined $seq && $seq->isa("Bio::PrimarySeqI") and $seq->id()); $temp->write_seq($seq); } $temp->close(); close($tfh); undef $tfh; return $infilename; } # $input may be a SimpleAlign object.
elsif (ref($input) eq "Bio::SimpleAlign") { # Open temporary file for both reading & writing of SimpleAlign object
($tfh,$infilename) = $self->io->tempfile(-dir=>$self->tempdir); $temp = Bio::AlignIO->new('-fh'=> $tfh, '-format' => 'fasta'); $temp->write_aln($input); close($tfh); undef $tfh; return $infilename; } # or $input may be a single BioSeq object (to be added to a previous alignment)
elsif (ref($input) && $input->isa("Bio::PrimarySeqI") && $suffix==2) { # Open temporary file for both reading & writing of BioSeq object
($tfh,$infilename) = $self->io->tempfile(); $temp = Bio::SeqIO->new(-fh=> $tfh, '-format' =>'Fasta'); $temp->write_seq($input); close($tfh); undef $tfh; return $infilename; } return;
| _setparams | description | prev | next | Top |
my $self = shift; my $param_string = $self->SUPER::_setparams(-params =>\@ CLUSTALW_PARAMS, -switches =>\@ CLUSTALW_SWITCHES, -dash => 1, -lc => 1, -join => '='); # Set default output file if no explicit output file selected}
unless ($param_string =~ /outfile/) { my ($tfh, $outfile) = $self->io->tempfile(-dir => $self->tempdir()); close($tfh); undef $tfh; $self->outfile($outfile); $param_string .= " -outfile=$outfile" ; } $param_string .= ' 2>&1'; return $param_string; } 1;
| PARAMETER FOR ALIGNMENT COMPUTATION | Top |
| KTUPLE | Top |
Title : KTUPLE
Description : (optional) set the word size to be used in the alignment
This is the size of exactly matching fragment that is used.
INCREASE for speed (max= 2 for proteins; 4 for DNA),
DECREASE for sensitivity.
For longer sequences (e.g. >1000 residues) you may
need to increase the default
| TOPDIAGS | Top |
Title : TOPDIAGS
Description : (optional) number of best diagonals to use
The number of k-tuple matches on each diagonal
(in an imaginary dot-matrix plot) is calculated.
Only the best ones (with most matches) are used in
the alignment. This parameter specifies how many.
Decrease for speed; increase for sensitivity.
| WINDOW | Top |
Title : WINDOW
Description : (optional) window size
This is the number of diagonals around each of the 'best'
diagonals that will be used. Decrease for speed;
increase for sensitivity.
| PAIRGAP | Top |
Title : PAIRGAP
Description : (optional) gap penalty for pairwise alignments
This is a penalty for each gap in the fast alignments.
It has little affect on the speed or sensitivity except
for extreme values.
| FIXEDGAP | Top |
Title : FIXEDGAP
Description : (optional) fixed length gap penalty
| FLOATGAP | Top |
Title : FLOATGAP
Description : (optional) variable length gap penalty
| MATRIX | Top |
Title : MATRIX
Default : PAM100 for DNA - PAM250 for protein alignment
Description : (optional) substitution matrix used in the multiple
alignments. Depends on the version of clustalw as to
what default matrix will be used
PROTEIN WEIGHT MATRIX leads to a new menu where you are offered a choice of weight matrices. The default for proteins in version 1.8 is the PAM series derived by Gonnet and colleagues. Note, a series is used! The actual matrix that is used depends on how similar the sequences to be aligned at this alignment step are. Different matrices work differently at each evolutionary distance. DNA WEIGHT MATRIX leads to a new menu where a single matrix (not a series) can be selected. The default is the matrix used by BESTFIT for comparison of nucleic acid sequences.
| TYPE | Top |
Title : TYPE
Description : (optional) sequence type: protein or DNA. This allows
you to explicitly overide the programs attempt at
guessing the type of the sequence. It is only useful
if you are using sequences with a VERY strange
composition.
| OUTPUT | Top |
Title : OUTPUT
Description : (optional) clustalw supports GCG or PHYLIP or PIR or
Clustal format. See the Bio::AlignIO modules for
which formats are supported by bioperl.
| OUTFILE | Top |
Title : OUTFILE
Description : (optional) Name of clustalw output file. If not set
module will erase output file. In any case alignment will
be returned in the form of SimpleAlign objects
| TRANSMIT | Top |
Title : TRANSMIT
Description : (optional) transitions not weighted. The default is to
weight transitions as more favourable than other
mismatches in DNA alignments. This switch makes all
nucleotide mismatches equally weighted.
| FEEDBACK | Top |
| Mailing Lists | Top |
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
| Support | Top |
Please direct usage questions or support issues to the mailing list:
bioperl-l@bioperl.org
rather than to the module maintainer directly. Many experienced and
reponsive experts will be able look at the problem and quickly
address it. Please include a thorough description of the problem
with code and data examples if at all possible.
| Reporting Bugs | Top |
http://bugzilla.open-bio.org/
| AUTHOR - Peter Schattner | Top |
| CONTRIBUTORS | Top |
| APPENDIX | Top |