| Summary | Included libraries | Package variables | Synopsis | Description | General documentation | Methods |
$db = Bio::DB::Flat->new(-directory => '/usr/share/embl',
-dbname => 'mydb',
-format => 'embl',
-index => 'bdb',
-write_flag => 1);
$db->build_index('/usr/share/embl/primate.embl',
'/usr/share/embl/protists.embl');
$seq = $db->get_Seq_by_id('BUM');
@sequences = $db->get_Seq_by_acc('DIV' => 'primate');
$raw = $db->fetch_raw('BUM');
| new | Description | Code |
| _initialize | No description | Code |
| _set_namespaces | No description | Code |
| new_from_registry | Description | Code |
| directory | No description | Code |
| write_flag | No description | Code |
| verbose | No description | Code |
| out_file | No description | Code |
| dbname | No description | Code |
| primary_namespace | No description | Code |
| secondary_namespaces | No description | Code |
| file_format | No description | Code |
| alphabet | No description | Code |
| parse_one_record | No description | Code |
| indexing_scheme | No description | Code |
| add_flat_file | No description | Code |
| write_config | No description | Code |
| files | No description | Code |
| write_seq | No description | Code |
| close | No description | Code |
| _filenos | No description | Code |
| _read_config | No description | Code |
| _config_path | No description | Code |
| _catfile | No description | Code |
| _config_name | No description | Code |
| _path2fileno | No description | Code |
| _fileno2path | No description | Code |
| _files | No description | Code |
| fetch | Description | Code |
| get_Seq_by_id | No description | Code |
| get_Seq_by_acc | No description | Code |
| fetch_raw | No description | Code |
| default_file_format | No description | Code |
| _store_index | No description | Code |
| default_primary_namespace | No description | Code |
| default_secondary_namespaces | No description | Code |
| seq_to_ids | No description | Code |
| DESTROY | No description | Code |
| new | code | next | Top |
Title : new
Usage : my $db = Bio::DB::Flat->new(
-directory => $root_directory,
-dbname => 'mydb',
-write_flag => 1,
-index => 'bdb',
-verbose => 0,
-out => 'outputfile',
-format => 'genbank');
Function: create a new Bio::DB::Flat object
Returns : new Bio::DB::Flat object
Args : -directory Root directory containing "config.dat"
-write_flag If true, allows creation/updating.
-verbose Verbose messages
-out File to write to when write_seq invoked
-index 'bdb' or 'binarysearch'
Status : Public
The required -directory argument indicates where the flat file indexeswill be stored. The build_index() and write_seq() methods will automatically create subdirectories of this root directory. Each subdirectory will contain a human-readable configuration file named "config.dat" that specifies where the individual indexes are stored. The required -dbname argument gives a name to the database index. The index files will actually be stored in a like-named subdirectory underneath the root directory. The -write_flag enables writing new entries into the database as well as the creation of the indexes. By default the indexes will be opened read only. -index is one of "bdb" or "binarysearch" and indicates the type of index to generate. "bdb" corresponds to Berkeley DB. You *must* be using BerkeleyDB version 2 or higher, and have the Perl BerkeleyDB extension installed (DB_File will *not* work). "binarysearch" corresponds to the OBDA "flat" indexed file. The -out argument specifies the output file for writing objects created with write_seq(). The -format argument specifies the format of the input file or files. If the file suffix is one that Bioperl can already associate with a format then this is optional. |
| new_from_registry | code | prev | next | Top |
Title : new_from_registry
Usage : $db = Bio::DB::Flat->new_from_registry(%config)
Function: creates a new Bio::DB::Flat object in a Bio::DB::Registry-
compatible fashion
Returns : new Bio::DB::Flat
Args : provided by the registry, see below
Status : Public
The following registry-configuration tags are recognized: location Root of the indexed flat file; corresponds to the new() method's
-directory argument. |
| fetch | code | prev | next | Top |
Title : fetch Usage : $index->fetch( $id ) Function: Returns a Bio::Seq object from the index Example : $seq = $index->fetch( 'dJ67B12' ) Returns : Bio::Seq object Args : IDDeprecated. Use get_Seq_by_id instead. |
| new | description | prev | next | Top |
my $class = shift; $class = ref($class) if ref($class); my $self = $class->SUPER::new(@_); # first we initialize ourselves}
my ($flat_directory,$dbname) = $self->_rearrange([qw(DIRECTORY DBNAME)],@_); defined $flat_directory or $self->throw('Please supply a -directory argument'); defined $dbname or $self->throw('Please supply a -dbname argument'); # set values from configuration file
$self->directory($flat_directory); $self->dbname($dbname); $self->throw("Base directory $flat_directory doesn't exist") unless -e $flat_directory; $self->throw("$flat_directory isn't a directory") unless -d _; my $dbpath = Bio::Root::IO->catfile($flat_directory,$dbname); unless (-d $dbpath) { $self->debug("creating db directory $dbpath\n"); mkdir $dbpath,0777 or $self->throw("Can't create $dbpath: $!"); } $self->_read_config(); # but override with initialization values
$self->_initialize(@_); $self->throw('you must specify an indexing scheme') unless $self->indexing_scheme; # now we figure out what subclass to instantiate
my $index_type = $self->indexing_scheme eq 'BerkeleyDB/1' ? 'BDB' :$self->indexing_scheme eq 'flat/1' ? 'Binary' :$self->throw("unknown indexing scheme: ".$self->indexing_scheme); my $format = $self->file_format; # because Michele and Lincoln did it differently
# Michele's way is via a standalone concrete class
if ($index_type eq 'Binary') { my $child_class = 'Bio::DB::Flat::BinarySearch'; eval "use $child_class"; $self->throw($@) if $@; return $child_class->new(@_); } # Lincoln uses Bio::SeqIO style delegation.
my $child_class= "Bio\:\:DB\:\:Flat\:\:$index_type\:\:\L$format"; eval "use $child_class"; $self->throw($@) if $@; # rebless & reinitialize with the new class
# (this prevents subclasses from forgetting to call our own initialization)
bless $self,$child_class; $self->_initialize(@_); $self->_set_namespaces(@_); $self;
| _initialize | description | prev | next | Top |
my $self = shift; my ($flat_write_flag,$dbname,$flat_indexing,$flat_verbose,$flat_outfile,$flat_format) = $self->_rearrange([qw(WRITE_FLAG DBNAME INDEX VERBOSE OUT FORMAT)],@_); $self->write_flag($flat_write_flag) if defined $flat_write_flag; if (defined $flat_indexing) { # very permissive}
$flat_indexing = 'BerkeleyDB/1' if $flat_indexing =~ /bdb/; $flat_indexing = 'flat/1' if $flat_indexing =~ /^(flat|binary)/; $self->indexing_scheme($flat_indexing); } $self->verbose($flat_verbose) if defined $flat_verbose; $self->dbname($dbname) if defined $dbname; $self->out_file($flat_outfile) if defined $flat_outfile; $self->file_format($flat_format) if defined $flat_format;
| _set_namespaces | description | prev | next | Top |
my $self = shift; $self->primary_namespace($self->default_primary_namespace) unless defined $self->{flat_primary_namespace}; $self->secondary_namespaces($self->default_secondary_namespaces) unless defined $self->{flat_secondary_namespaces}; $self->file_format($self->default_file_format) unless defined $self->{flat_format};}
| new_from_registry | description | prev | next | Top |
my ($self,%config) = @_; my $location = $config{'location'} or $self->throw('location tag must be specified.'); my $dbname = $config{'dbname'} or $self->throw('dbname tag must be specified.'); #my $index = $self->new(-directory => $location,}
# -dbname => $dbname,
# );
# my $index = $config{'protocol'} or $self->throw('index or protocol tag must be specified.');
my $db = $self->new(-directory => $location, -dbname => $dbname, # -index => $index LS: PROTOCOL DOES NOT SPECIFY INDEXING SCHEME
); $db;
| directory | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_directory}; $self->{flat_directory} = shift if @_; $d;}
| write_flag | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_write_flag}; $self->{flat_write_flag} = shift if @_; $d;}
| verbose | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_verbose}; $self->{flat_verbose} = shift if @_; $d;}
| out_file | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_outfile}; $self->{flat_outfile} = shift if @_; $d;}
| dbname | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_dbname}; $self->{flat_dbname} = shift if @_; $d;}
| primary_namespace | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_primary_namespace}; $self->{flat_primary_namespace} = shift if @_; $d;}
| secondary_namespaces | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_secondary_namespaces}; $self->{flat_secondary_namespaces} = (ref($_[0]) eq 'ARRAY' ? shift : [@_]) if @_; return unless $d; $d = [$d] if $d && ref($d) ne 'ARRAY'; # just paranoia}
return wantarray ? @$d : $d;
| file_format | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_format}; $self->{flat_format} = shift if @_; $d;}
| alphabet | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_alphabet}; $self->{flat_alphabet} = shift if @_; $d;}
| parse_one_record | description | prev | next | Top |
my $self = shift; my $fh = shift; my $parser = $self->{cached_parsers}{fileno($fh)} ||= Bio::SeqIO->new(-fh=>$fh,-format=>$self->default_file_format); my $seq = $parser->next_seq or return; $self->{flat_alphabet} ||= $seq->alphabet; my $ids = $self->seq_to_ids($seq); return $ids;}
| indexing_scheme | description | prev | next | Top |
my $self = shift; my $d = $self->{flat_indexing}; $self->{flat_indexing} = shift if @_; $d;}
| add_flat_file | description | prev | next | Top |
my $self = shift; my ($file_path,$file_length,$nf) = @_; # check that file_path is absolute}
unless (File::Spec->file_name_is_absolute($file_path)) { $file_path = File::Spec->rel2abs($file_path); } -r $file_path or $self->throw("flat file $file_path cannot be read: $!"); my $current_size = -s _; if (defined $file_length) { $current_size == $file_length or $self->throw("flat file $file_path has changed size. Was $file_length bytes; now $current_size"); } else { $file_length = $current_size; } unless (defined $nf) { $self->{flat_file_index} = 0 unless exists $self->{flat_file_index}; $nf = $self->{flat_file_index}++; } $self->{flat_flat_file_path}{$nf} = $file_path; $self->{flat_flat_file_no}{$file_path} = $nf; $nf;
| write_config | description | prev | next | Top |
my $self = shift; $self->write_flag or $self->throw("cannot write configuration file because write_flag is not set"); my $path = $self->_config_path; open (F,">$path") or $self->throw("open error on $path: $!"); my $index_type = $self->indexing_scheme; print F "index\t$index_type\n"; my $format = $self->file_format; my $alphabet = $self->alphabet; my $alpha = $alphabet ? "/$alphabet" : ''; print F "format\tURN:LSID:open-bio.org:${format}${alpha}\n"; my @filenos = $self->_filenos or $self->throw("cannot write config file because no flat files defined"); for my $nf (@filenos) { my $path = $self->{flat_flat_file_path}{$nf}; my $size = -s $path; print F join("\t","fileid_$nf",$path,$size),"\n"; } # write primary namespace}
my $primary_ns = $self->primary_namespace or $self->throw('cannot write config file because no primary namespace defined'); print F join("\t",'primary_namespace',$primary_ns),"\n"; # write secondary namespaces
my @secondary = $self->secondary_namespaces; print F join("\t",'secondary_namespaces',@secondary),"\n"; close F or $self->throw("close error on $path: $!");
| files | description | prev | next | Top |
my $self = shift; return unless $self->{flat_flat_file_no}; return keys %{$self->{flat_flat_file_no}};}
| write_seq | description | prev | next | Top |
my $self = shift; my $seq = shift; $self->write_flag or $self->throw("cannot write sequences because write_flag is not set"); my $file = $self->out_file or $self->throw('no outfile defined; use the -out argument to new()'); my $seqio = $self->{flat_cached_parsers}{$file} ||= Bio::SeqIO->new(-Format => $self->file_format, -file => ">$file") or $self->throw("couldn't create Bio::SeqIO object"); my $fh = $seqio->_fh or $self->throw("couldn't get filehandle from Bio::SeqIO object"); my $offset = tell($fh); $seqio->write_seq($seq); my $length = tell($fh)-$offset; my $ids = $self->seq_to_ids($seq); $self->_store_index($ids,$file,$offset,$length); $self->{flat_outfile_dirty}++;}
| close | description | prev | next | Top |
my $self = shift; return unless $self->{flat_outfile_dirty}; $self->write_config; delete $self->{flat_outfile_dirty}; delete $self->{flat_cached_parsers}{$self->out_file};}
| _filenos | description | prev | next | Top |
my $self = shift; return unless $self->{flat_flat_file_path}; return keys %{$self->{flat_flat_file_path}};}
| _read_config | description | prev | next | Top |
my $self = shift; my $path = $self->_config_path; return unless -e $path; open (F,$path) or $self->throw("open error on $path: $!"); my %config; while (<F>) { chomp; my ($tag,@values) = split "\t"; $config{$tag} =\@ values; } CORE::close F or $self->throw("close error on $path: $!"); $config{index}[0] =~ m~(flat/1|BerkeleyDB/1)~ or $self->throw("invalid configuration file $path: no index line"); $self->indexing_scheme($1); if ($config{format}) { # handle LSID format}
if ($config{format}[0] =~ /^URN:LSID:open-bio\.org:(\w+)(?:\/(\w+))/) { $self->file_format($1); $self->alphabet($2); } else { # compatibility with older versions
$self->file_format($config{format}[0]); } } # set up primary namespace
my $primary_namespace = $config{primary_namespace}[0] or $self->throw("invalid configuration file $path: no primary namespace defined"); $self->primary_namespace($primary_namespace); # set up secondary namespaces (may be empty)
$self->secondary_namespaces($config{secondary_namespaces}); # get file paths and their normalization information
my @normalized_files = grep {$_ ne ''} map {/^fileid_(\S+)/ && $1} keys %config; for my $nf (@normalized_files) { my ($file_path,$file_length) = @{$config{"fileid_${nf}"}}; $self->add_flat_file($file_path,$file_length,$nf); } 1;
| _config_path | description | prev | next | Top |
my $self = shift; $self->_catfile($self->_config_name);}
| _catfile | description | prev | next | Top |
my $self = shift; my $component = shift; Bio::Root::IO->catfile($self->directory,$self->dbname,$component);}
| _config_name | description | prev | next | Top |
CONFIG_FILE_NAME}
| _path2fileno | description | prev | next | Top |
my $self = shift; my $path = shift; return $self->add_flat_file($path) unless exists $self->{flat_flat_file_no}{$path}; $self->{flat_flat_file_no}{$path};}
| _fileno2path | description | prev | next | Top |
my $self = shift; my $fileno = shift; $self->{flat_flat_file_path}{$fileno};}
| _files | description | prev | next | Top |
my $self = shift; my $paths = $self->{flat_flat_file_no}; return keys %$paths;}
| fetch | description | prev | next | Top |
shift->get_Seq_by_id(@_)}
| get_Seq_by_id | description | prev | next | Top |
my $self = shift; my $id = shift; $self->throw_not_implemented;}
| get_Seq_by_acc | description | prev | next | Top |
my $self = shift; return $self->get_Seq_by_id(shift) if @_ == 1; my ($ns,$key) = @_; $self->throw_not_implemented;}
| fetch_raw | description | prev | next | Top |
my ($self,$id,$namespace) = @_; $self->throw_not_implemented;}
| default_file_format | description | prev | next | Top |
my $self = shift; $self->throw_not_implemented;}
| _store_index | description | prev | next | Top |
my ($ids,$file,$offset,$length) = @_; $self->throw_not_implemented;}
| default_primary_namespace | description | prev | next | Top |
return "ACC";}
| default_secondary_namespaces | description | prev | next | Top |
return;}| seq_to_ids | description | prev | next | Top |
my $self = shift; my $seq = shift; my %ids; $ids{$self->primary_namespace} = $seq->accession_number;\% ids;}
| DESTROY | description | prev | next | Top |
my $self = shift; $self->close;}
| FEEDBACK | Top |
| Mailing Lists | Top |
bioperl-l@bioperl.org - General discussion http://bioperl.org/MailList.shtml - About the mailing lists
| Reporting Bugs | Top |
bioperl-bugs@bio.perl.org http://bugzilla.bioperl.org/
| AUTHOR - Lincoln Stein | Top |
| APPENDIX | Top |
| To Be Implemented in Subclasses | Top |
| May Be Overridden in Subclasses | Top |