Bio::DB::GFF::Adaptor berkeleydb
Other packages in the module: Bio::DB::GFF::Adaptor::berkeleydb
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Toolbar
WebCvs
Summary
Bio::DB::GFF::Adaptor::berkeleydb -- Bio::DB::GFF database adaptor for in-memory databases
Package variables
No package variables defined.
Included modules
Bio::DB::Fasta
Bio::DB::GFF::Adaptor::berkeleydb::iterator
Bio::DB::GFF::Adaptor::memory::feature_serializer
Bio::DB::GFF::Util::Binning
Bio::DB::GFF::Util::Rearrange
DB_File
File::Path ' mkpath '
File::Spec
File::Temp ' tempfile '
Inherit
Bio::DB::GFF::Adaptor::memory
Synopsis
  use Bio::DB::GFF;
my $db = Bio::DB::GFF->new(-adaptor=> 'berkeleydb',
-create => 1, # on initial build you need this
-dsn => '/usr/local/share/gff/dmel');
# initialize an empty database, then load GFF and FASTA files $db->initialize(1); $db->load_gff('/home/drosophila_R3.2.gff'); $db->load_fasta('/home/drosophila_R3.2.fa'); # do queries my $segment = $db->segment(Chromosome => '1R'); my $subseg = $segment->subseq(5000,6000); my @features = $subseg->features('gene');
See Bio::DB::GFF for other methods.
Description
This adaptor implements a berkeleydb-indexed version of Bio::DB::GFF.
It requires the DB_File and Storable modules. It can be used to store
and retrieve short to medium-length GFF files of several million
features in length.
Methods
new
No description
Code
_autoindex
No description
Code
_open_databases
No description
Code
_close_databases
No description
Code
_delete_features
No description
Code
_secondary_keys
No description
Code
_delete
No description
Code
_bump_feature_count
No description
Code
_bump_class_count
No description
Code
classes
No description
Code
do_initialize
No description
Code
load_sequence
No description
Code
_mtime
No description
Code
_index_file
No description
Code
_data_file
No description
Code
_fasta_file
No description
Code
_notes_file
No description
Code
_temp_file
No description
Code
_timestamp_file
No description
Code
db
No description
Code
dsn
No description
Code
tmpdir
No description
Code
load_gff_line
No description
Code
setup_load
No description
Code
finish_load
No description
Code
_touch_timestamp
No description
Code
get_abscoords
No description
Code
get_types
No description
Code
_feature_by_nameDescriptionCode
_feature_by_attribute
No description
Code
search_notes
No description
Code
_get_features_by_search_options
No description
Code
retrieve_features
No description
Code
retrieve_features_range
No description
Code
filter_features
No description
Code
_basic_features_by_id
No description
Code
normalizeNumber
No description
Code
get_features_iterator
No description
Code
Methods description
_feature_by_namecode    nextTop
 Title   : _feature_by_name
Usage : $db->get_features_by_name($class,$name,$callback)
Function: get a list of features by name and class
Returns : count of number of features retrieved
Args : name of feature, class of feature, and a callback
Status : protected
This method is used internally. The callback arguments are those used
by make_feature().
Methods code
newdescriptionprevnextTop
sub new {
  my $class = shift ;
  my ($dbdir,$preferred_groups,$autoindex,$write,$create,$tmpdir) = rearrange([
									       [qw(DSN DB)],
									       'PREFERRED_GROUPS',
									       [qw(DIR AUTOINDEX)],
									       [qw(WRITE WRITABLE)],
									       'CREATE',
									       'TMP',
									      ],@_);
  $tmpdir ||= File::Spec->tmpdir;
  $dbdir  ||= $autoindex;
  $dbdir  ||= "$tmpdir/test";
  $write  ||= $create;

  my $self = bless {},$class;
  $self->dsn($dbdir);
  $self->tmpdir($tmpdir);
  $self->preferred_groups($preferred_groups) if defined $preferred_groups;
  $self->_autoindex($autoindex)              if $autoindex;
  $self->_open_databases($write,$create);
  return $self;
}
_autoindexdescriptionprevnextTop
sub _autoindex {
  my $self    = shift;
  my $autodir = shift;

  my $dir    = $self->dsn;
  my %ignore = map {$_=>1} ($self->_index_file,$self->_data_file,
			    $self->_fasta_file,$self->_temp_file,
			    $self->_notes_file,
			    $self->_timestamp_file);

  my $maxtime   = 0;
  my $maxfatime = 0;

  opendir (my $D,$autodir) or $self->throw("Couldn't open directory $autodir for reading: $!");

  while (defined (my $node = readdir($D))) {
    next if $node =~ /^\./;
    my $path      = "$dir/$node";
    next if $ignore{$path};
    next unless -f $path;
    my $mtime = _mtime(\*_);  # not a typo
$maxtime = $mtime if $mtime > $maxtime; $maxfatime = $mtime if $mtime > $maxfatime && $node =~ /\.(?:fa|fasta|dna)(?:\.gz)?$/; } close $D; my $timestamp_time = _mtime($self->_timestamp_file) || 0; my $all_files_exist = -e $self->_index_file && -e $self->_data_file && (-e $self->_fasta_file || !$maxfatime); # to avoid rebuilding FASTA files if not changed
my $spare_fasta = $maxfatime > 0 && $maxfatime < $timestamp_time && -e $self->_fasta_file; if ($maxtime > $timestamp_time || !$all_files_exist) { print STDERR __PACKAGE__,": Reindexing files in $dir. This may take a while....\n"; $self->do_initialize(1,$spare_fasta); $self->load_gff($autodir,1); $self->load_fasta($autodir,1) unless $spare_fasta; print STDERR __PACKAGE__,": Reindexing done\n"; } else { $self->_open_databases(); }
}
_open_databasesdescriptionprevnextTop
sub _open_databases {
  my $self   = shift;
  my ($write,$create) = @_;

  my $dsn  = $self->dsn;
  unless (-d $dsn) {  # directory does not exist
$create or $self->throw("Directory $dsn does not exist and you did not specify the -create flag"); mkpath($dsn) or $self->throw("Couldn't create database directory $dsn: $!"); } my %db; local $DB_BTREE->{flags} = R_DUP; $DB_BTREE->{compare} = sub { lc($_[0]) cmp lc($_[1]) }; my $flags = O_RDONLY; $flags |= O_RDWR if $write; $flags |= O_CREAT if $create; tie(%db,'DB_File',$self->_index_file,$flags,0666,$DB_BTREE) or $self->throw("Couldn't tie ".$self->_index_file.": $!"); $self->{db} =\% db; $self->{data} = FeatureStore->new($self->_data_file,$write,$create); if (-e $self->_fasta_file) { my $dna_db = Bio::DB::Fasta->new($self->_fasta_file) or $self->throw("Can't reindex sequence file: $@"); $self->dna_db($dna_db); } my $mode = $write ? "+>>" : $create ? "+>" : "<"; open (my $F,$mode,$self->_notes_file) or $self->throw($self->_notes_file.": $!"); $self->{notes} = $F;
}
_close_databasesdescriptionprevnextTop
sub _close_databases {
  my $self = shift;
  delete $self->{db};
  delete $self->{data};
  delete $self->{notes};
}
_delete_featuresdescriptionprevnextTop
sub _delete_features {
  my $self        = shift;
  my @feature_ids = @_;
  my $removed = 0;
  my $last_id = $self->{data}->last_id;
  for my $id (@feature_ids) {
    next unless $id >= 0 && $id < $last_id;
    my $feat  = $self->{data}->get($id) or next;
    $self->{data}->remove($id);
    $self->_bump_class_count($feat->{gclass},-1);
    my @keys = $self->_secondary_keys($feat);
    $self->db->del_dup($_,$id) foreach @keys;
    $removed++;
  }
  $removed;
}
_secondary_keysdescriptionprevnextTop
sub _secondary_keys {
  my $self = shift;
  my $feat = shift;
  return (
		"__name__".lc(join ":",$feat->{gclass},$feat->{gname}),
		"__bin__".lc("$feat->{ref}$;$feat->{bin}"),
		"__type__".join(':',$feat->{method},$feat->{source}),
		map {"__attr__".lc(join(':',$_->[0],$_->[1]))} @{$feat->{attributes}}
	  );
}
_deletedescriptionprevnextTop
sub _delete {
  my $self        = shift;
  my $delete_spec = shift;
  return $self->SUPER::_delete($delete_spec) if @{$delete_spec->{segments}} or @{$delete_spec->{types}};
  $self->throw("This operation would delete all feature data and -force not specified")
    unless $delete_spec->{force};
  my $deleted = $self->{db}{__count__};
  $self->{data} = FeatureStore->new($self->_data_file,1,1);
  %{$self->{db}}   = ();
  $deleted;
}

# with duplicates enabled, we cannot simply do $db->{__index__}++;
}
_bump_feature_countdescriptionprevnextTop
sub _bump_feature_count {
  my $self = shift;
  my $db = $self->{db};
  if (@_) {
    delete $db->{__count__};
    return $db->{__count__} = shift;
  } else {
    my $index = ${db}->{__count__};
    delete $db->{__count__};
    $db->{__count__} = ($index || 0) + 1;
    return $index;
  }
}
_bump_class_countdescriptionprevnextTop
sub _bump_class_count {
  my $self = shift;
  my ($class,$count) = @_;
  $count ||= 1;
  my $db  = $self->{db};
  my $key = "__class__$class";
  my $newcount = ($db->{$key} || 0) + $count;
  delete $db->{$key};
  $db->{$key} = $newcount;
}
classesdescriptionprevnextTop
sub classes {
  my $self = shift;
  my $db   = $self->db;
  my ($key,$value) = ('__class__',undef);
  my %classes;
  for (my $status = $db->seq($key,$value,R_CURSOR);
       $status == 0;
       $status = $db->seq($key,$value,R_NEXT)) {
    my ($class) = $key =~ /^__class__(.+)/ or last;
    $classes{$class}++ if $value > 0;
  }
  my @classes = sort keys %classes;
  return @classes;
}
do_initializedescriptionprevnextTop
sub do_initialize {
  my $self  = shift;
  my $erase = shift;
  my $spare_fasta = shift; # used internally only!
if ($erase) { $self->_close_databases; unlink $self->_index_file; unlink $self->_data_file; unlink $self->_notes_file; unless ($spare_fasta) { unlink $self->_fasta_file; unlink $self->_fasta_file.'.index'; } unlink $self->_timestamp_file; $self->_open_databases(1,1); } 1; } # load_sequence($fasta_filehandle,$first_sequence_id)
}
load_sequencedescriptionprevnextTop
sub load_sequence {
  my $self = shift;
  my ($io_handle,$id) = @_;
  my $file = $self->_fasta_file;
  my $loaded = 0;

  open (my $F,">>$file") or $self->throw("Couldn't open $file for writing: $!");

  if (defined $id) {
    print $F ">$id\n";
    $loaded++;
  }

  while (<$io_handle>) {
    $loaded++ if /^>/;
    print $F $_;
  }
  close $F;
  my $dna_db = Bio::DB::Fasta->new($file) or $self->throw("Can't reindex sequence file: $@");
  $self->dna_db($dna_db);
  $self->_touch_timestamp;
  return $loaded;
}
_mtimedescriptionprevnextTop
sub _mtime {
  my $file = shift;
  my @stat = stat($file);
  return $stat[9];
}
_index_filedescriptionprevnextTop
sub _index_file {
  my $self = shift;
  return $self->dsn . "/bdb_features.btree";
}
_data_filedescriptionprevnextTop
sub _data_file {
  my $self = shift;
  return $self->dsn . "/bdb_features.data";
}
_fasta_filedescriptionprevnextTop
sub _fasta_file {
  my $self = shift;
  return $self->dsn . "/bdb_sequence.fa";
}
_notes_filedescriptionprevnextTop
sub _notes_file {
  my $self = shift;
  return $self->dsn . "/bdb_notes.idx";
}
_temp_filedescriptionprevnextTop
sub _temp_file {
  my $self = shift;
  local $^W=0;
  my (undef,$filename) = tempfile("bdb_temp_XXXXXX",DIR=>$self->tmpdir,OPEN=>0);
  return $filename;
}
_timestamp_filedescriptionprevnextTop
sub _timestamp_file {
  my $self = shift;
  return $self->dsn ."/bdb_timestamp";
}
dbdescriptionprevnextTop
sub db {
  my $db   = shift()->{db} or return;
  return tied(%$db);
}
dsndescriptionprevnextTop
sub dsn {
  my $self = shift;
  my $d    = $self->{dsn};
  $self->{dsn} = shift if @_;
  $d;
}
tmpdirdescriptionprevnextTop
sub tmpdir {
  my $self = shift;
  my $d    = $self->{tmpdir};
  $self->{tmpdir} = shift if @_;
  $d;
}
load_gff_linedescriptionprevnextTop
sub load_gff_line {
  my ($self, $feat) = @_;

  $feat->{strand} = '' if $feat->{strand} && $feat->{strand} eq '.';
  $feat->{phase} = ''  if $feat->{phase}  && $feat->{phase}  eq '.';

  my $start = $feat->{start};
  my $stop = $feat->{stop};
  my $type = join(':',$feat->{method},$feat->{source});

  my $bin =  bin($feat->{start},$feat->{stop},MIN_BIN);
  $feat->{bin} = $bin;

  my $id = $self->{data}->put($feat);
  $bin = $self->normalizeNumber($bin);

  my $db = $self->{db};
  for my $skey ($self->_secondary_keys($feat)) {
    $db->{$skey} = $id;
  }

  # save searchable notes to separate index
my $fh = $self->{notes}; my @notes = map {$_->[1]} grep {lc $_->[0] eq 'note'} @{$feat->{attributes}}; print $fh $_,"\t",pack("u*",$id) or $self->throw("An error occurred while updating indexes: $!") foreach @notes; $self->{records_loaded}++; $self->_bump_feature_count(); $self->_bump_class_count($feat->{gclass}); } # do nothing!
}
setup_loaddescriptionprevnextTop
sub setup_load {
  my $self = shift;
  $self->{records_loaded} = 0;
  1;
}
finish_loaddescriptionprevnextTop
sub finish_load {
  my $self = shift;
  $self->db->sync && $self->throw("An error occurred while updating indexes: $!");
  $self->_touch_timestamp;
  $self->{records_loaded};
}
_touch_timestampdescriptionprevnextTop
sub _touch_timestamp {
  my $self = shift;
  my $tsf = $self->_timestamp_file;
  open (my $F,">$tsf") or $self->throw("Couldn't open $tsf: $!");
  print $F scalar(localtime);
}


# given sequence name, return (reference,start,stop,strand)
}
get_abscoordsdescriptionprevnextTop
sub get_abscoords {
  my $self = shift;
  my ($name,$class,$refseq) = @_;
  my %refs;
  my $regexp;

  if ($name =~ /[*?]/) {  # uh oh regexp time
$name = quotemeta($name); $name =~ s/\\\*/.*/g; $name =~ s/\\\?/.?/g; $regexp++; } # Find all features that have the requested name and class.
# Sort them by reference point.
my @features = @{$self->retrieve_features(-table => 'name', -key=>"$class:$name")}; if (!@features) { # nothing matched exactly, so try aliases
@features = @{$self->retrieve_features(-table=>'attr',-key=>"Alias:$name")}; } foreach my $feature (@features){ push @{$refs{$feature->{ref}}},$feature; } # find out how many reference points we recovered
if (! %refs) { $self->error("$name not found in database"); return; } # compute min and max
my ($ref) = keys %refs; my @found = @{$refs{$ref}}; my ($strand,$start,$stop); my @found_segments; foreach my $ref (keys %refs) { next if defined($refseq) and $ref ne $refseq; my @found = @{$refs{$ref}}; my ($strand,$start,$stop,$name); foreach (@found) { $strand ||= $_->{strand}; $strand = '+' if $strand && $strand eq '.'; $start = $_->{start} if !defined($start) || $start > $_->{start}; $stop = $_->{stop} if !defined($stop) || $stop < $_->{stop}; $name ||= $_->{gname}; } push @found_segments,[$ref,$class,$start,$stop,$strand,$name]; } return\@ found_segments;
}
get_typesdescriptionprevnextTop
sub get_types {
  my $self = shift;
  my ($srcseq,$class,$start,$stop,$want_count,$typelist) = @_;
  my (%obj,%result,$key,$value);
  $key = "__type__";

  if (!$srcseq) { # optimized full type list
my $db = $self->db; my $status = $db->seq($key,$value,R_CURSOR); while ($status == 0 && $key =~ /^__type__(.+)/) { my $type = $1; my ($method,$source) = split ':',$type; $obj{$type} = Bio::DB::GFF::Typename->new($method,$source); $result{$type}++; if ($want_count) { $status = $db->seq($key,$value,R_NEXT); } else { # skip to next key set
$key .= "\0"; $status = $db->seq($key,$value,R_CURSOR) } } } else { # range search
for my $feature (@{$self->_get_features_by_search_options( {rangetype => 'overlaps', refseq => $srcseq, refclass => ($class || undef), start => ($start || undef), stop => ($stop || undef), }, {} )} ) { my $type = Bio::DB::GFF::Typename->new($feature->{method},$feature->{source}); $obj{$type} = $type; $result{$type}++; } } return $want_count ? %result : values %obj; } # Low level implementation of fetching a named feature.
# GFF annotations are named using the group class and name fields.
# May return zero, one, or several Bio::DB::GFF::Feature objects.
}
_feature_by_namedescriptionprevnextTop
sub _feature_by_name {
  my $self = shift;
  my ($class,$name,$location,$callback) = @_;
  $callback || $self->throw('must provide a callback argument');

  #use Devel::StackTrace;
#warn Devel::StackTrace->new->as_string;
my $count = 0; my $id = -1; my ($use_regexp, $use_glob,$using_alias_search); if ($name =~ /[*?]/) { # uh oh regexp time
#If there is only one trailing *, do a range search
if ($name =~ /^([^\*]+)\*$/) { $name = $1; $use_glob++; } else { $name = quotemeta($name); $name =~ s/\\\*/.*/g; $name =~ s/\\\?/.?/g; $use_regexp++; } } my @features; if ($use_glob) { my $callback = sub {my $feat = shift; $feat->{gname} =~ /^$name/i}; @features = @{$self->retrieve_features_range (-table => 'name', -start => "$class:$name", -do_while => $callback) }; } elsif ($use_regexp) { my $filter = sub {my $feat = shift; $feat->{gname} =~ /$name/i}; @features = @{$self->filter_features(-table =>'name', -filter => $filter)}; } else { @features = @{$self->retrieve_features(-table=>'name', -key => "$class:$name")}; } unless (@features) { $using_alias_search++; @features = @{$self->retrieve_features(-table=>'attr', -key=>"Alias:$name")}; } foreach my $feature (@features){ $id++; next unless $using_alias_search || $feature->{gclass} eq $class; if ($location) { next if $location->[0] ne $feature->{ref}; next if $location->[1] && $location->[1] > $feature->{stop}; next if $location->[2] && $location->[2] < $feature->{start}; } $count++; $callback->(@{$feature}{@hash2array_map},0); } return $count; } #sub get_feature_by_attribute{
}
_feature_by_attributedescriptionprevnextTop
sub _feature_by_attribute {
  my $self = shift;
  my ($attributes,$callback) = @_;
  $callback || $self->throw('must provide a callback argument');
  my $count = 0;
  my $feature_group_id = undef;

  #there could be more than one set of attributes......
while (my ($key, $value) = each %$attributes) { my @features = @{$self->retrieve_features (-table => "attr", -key => "$key:$value")}; for my $feature (@features) { $callback->(@{$feature}{@hash2array_map},$feature_group_id); $count++; } }
}
search_notesdescriptionprevnextTop
sub search_notes {
  my $self = shift;
  my ($search_string,$limit) = @_;

  $search_string =~ tr/*?//d;

  my @results;

  my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g;
  my $search = join '|',@words;

  my (%found,$found);
  my $note_index = $self->{notes};
  seek($note_index,0,0);  # back to start
while (<$note_index>) { next unless /$search/; chomp; my ($note,$uu) = split "\t"; $found{unpack("u*",$uu)}++; last if $limit && ++$found >= $limit; } my (@features, @matches); for my $idx (keys %found) { my $feature = $self->{data}->get($idx) or next; my @attributes = @{$feature->{attributes}}; my @values = map {lc $_->[0] eq 'note' ? $_->[1] : ()} @attributes; my $value = "@values"; my $hits; $hits++ while $value =~ /($search)/ig; # count the number of times we were hit
push @matches,$hits; push @features,$feature; } for (my $i=0; $i<@matches; $i++) { my $feature = $features[$i]; my $matches = $matches[$i]; my $relevance = 10 * $matches; my $featname = Bio::DB::GFF::Featname->new($feature->{gclass}=>$feature->{gname}); my $type = Bio::DB::GFF::Typename->new($feature->{method}=>$feature->{source}); my $note; $note = join ' ',map {$_->[1]} grep {$_->[0] eq 'Note'} @{$feature->{attributes}}; push @results,[$featname,$note,$relevance,$type]; } return @results;
}
_get_features_by_search_optionsdescriptionprevnextTop
sub _get_features_by_search_options {
  #The $data argument is not used and is preserved for superclass compatibility
my ($self, $search,$options) = @_; my $count = 0; my ($rangetype,$refseq,$class,$start,$stop,$types,$sparse,$order_by_group,$attributes,$temp_file) = (@{$search}{qw(rangetype refseq refclass start stop types)}, @{$options}{qw(sparse sort_by_group ATTRIBUTES temp_file)}) ; $start = 0 unless defined($start); $stop = MAX_BIN unless defined($stop); my $bin = bin($start,$stop,MIN_BIN); $bin = $self->normalizeNumber($bin); my ($results,@features,%found,%results_table); if ($temp_file) { local $DB_BTREE->{flags} = R_DUP; # note: there is a race condition possible here, if someone reuses the
# same name between the time we get the tmpfile name and the time we
# ask DB_File to open it.
tie(%results_table,'DB_File',$temp_file,O_RDWR|O_CREAT,0666,$DB_BTREE) or $self->throw("Couldn't tie temporary file ".$temp_file." for writing: $!"); $results =\% results_table; } else { $results =\@ features; } my $filter = sub { my $feature = shift; my $ref = $feature->{ref}; my $feature_start = $feature->{start}; my $feature_stop = $feature->{stop}; my $feature_id = $feature->{feature_id}; return 0 if $found{$feature_id}++; if (defined $refseq) { return 0 unless lc $refseq eq lc $ref; $start = 0 unless defined($start); $stop = MAX_SEGMENT unless defined($stop); if ($rangetype eq 'overlaps') { return 0 unless $feature_stop >= $start && $feature_start <= $stop; } elsif ($rangetype eq 'contains') { return 0 unless $feature_start >= $start && $feature_stop <= $stop; } elsif ($rangetype eq 'contained_in') { return 0 unless $feature_start <= $start && $feature_stop >= $stop; } else { return 0 unless $feature_start == $start && $feature_stop == $stop; } } my $feature_source = $feature->{source}; my $feature_method = $feature->{method}; if (defined $types && @$types){ return 0 unless $self->_matching_typelist($feature_method,$feature_source,$types); } my $feature_attributes = $feature->{attributes}; if (defined $attributes){ return 0 unless $self->_matching_attributes($feature_attributes,$attributes); } return 1; }; if (defined $refseq && !$sparse) { my $tier = MAX_BIN; while ($tier >= MIN_BIN) { my ($tier_start,$tier_stop) = (bin_bot($tier,$start),bin_top($tier,$stop)); # warn "Using $tier_start $tier_stop\n";
if ($tier_start == $tier_stop) { $self->retrieve_features(-table => "bin", -key => "$refseq$;$tier_start", -filter => $filter, -result => $results); } else { my $callback = sub {my $feat = shift; $feat->{bin} <= $tier_stop}; $self->retrieve_features_range(-table => "bin", -start => "$refseq$;$tier_start", -do_while => $callback, -filter => $filter, -result => $results); } $tier /= 10;
} } elsif (@$types) { foreach (@$types) { my $type = join ':',@$_; $self->retrieve_features_range(-table => 'type', -start => $type, -filter => $filter, -do_while => sub { my $f = shift; lc($f->{method}) eq lc($_->[0]) && lc($f->{source}||$_->[1]||'') eq lc($_->[1]||'') }, -result => $results); } } elsif (defined $attributes) { my ($attribute_name,$attribute_value) = each %$attributes; # pick first one
$self->retrieve_features(-table => 'attr', -key => "${attribute_name}:${attribute_value}", -filter => $filter, -result => $results); } else { $self->filter_features(-filter => $filter,-result=>$results); } return $results;
}
retrieve_featuresdescriptionprevnextTop
sub retrieve_features {
  my $self = shift;
  my ($table, $key, $filter, $result) = rearrange(['TABLE','KEY','FILTER', 'RESULT'],@_);

  my @result;
  $result ||=\@ result;

  my $frozen;
  my @ids  = $self->db->get_dup("__".lc($table)."__".lc($key));
  my $data = $self->{data};
  local $^W = 0;   # because _hash_to_array() will generate lots of uninit values
foreach my $id (@ids) { my $feat = $data->get($id); my $filter_result = $filter ? $filter->($feat) : 1; next unless $filter_result; if (ref $result eq 'HASH') { $result->{"$feat->{gclass}:$feat->{gname}"} = join ($;,$self->_hash_to_array($feat)); } else { push @$result, $feat; } last if $filter_result == -1; } return $result;
}
retrieve_features_rangedescriptionprevnextTop
sub retrieve_features_range {
  my ($self) = shift;
  my ($table, $start, $do_while, $filter, $result) = rearrange(['TABLE','START','DO_WHILE', 'FILTER', 'RESULT'],@_);
  local $^W = 0;  # because _hash_to_array will generate lots of uninit warnings
my @result; $result ||=\@ result; my ($id, $key, $value); $key = "__".$table."__".$start; my $db = $self->db; for (my $status = $db->seq($key,$value,R_CURSOR); $status == 0; $status = $db->seq($key,$value,R_NEXT)) { my $feat = $self->{data}->get($value); last unless $do_while->($feat,$key); my $filter_result = $filter ? $filter->($feat) : 1; next unless $filter_result; if (ref $result eq 'HASH') { $result->{"$feat->{gclass}:$feat->{gname}"} = join($;,$self->_hash_to_array($feat)); } else { push @$result,$feat; } last if $filter_result == -1; } return $result;
}
filter_featuresdescriptionprevnextTop
sub filter_features {
  my ($self) = shift;

  my ($filter,$result) = rearrange(['FILTER','RESULT'],@_);

  my @result;
  $result ||=\@ result;

  my ($key, $frozen);
  my $data = $self->{data};
  $data->reset;
  while (my $feat = $data->next) {

    my $filter_result = $filter ? $filter->($feat) : 1;
    next unless $filter_result;

    if (ref($result) eq 'HASH') {
      $result->{"$feat->{gclass}:$feat->{gname}"} = join($;,$self->_hash_to_array($feat));
    } else {
      push @$result,$feat;
    }
    last if $filter_result == -1;
  }

  return $result;
}
_basic_features_by_iddescriptionprevnextTop
sub _basic_features_by_id {
  my $self = shift;
  my ($ids) = @_;

  $ids = [$ids] unless ref $ids =~ /ARRAY/;

  my @result;
  my $data = $self->{data};
  for my $feature_id (@$ids){
    push @result, $data->get($feature_id);
  }

  return wantarray() ? @result : $result[0];
}
normalizeNumberdescriptionprevnextTop
sub normalizeNumber {
  my ($self, $num) = @_;
  while ((length $num) < MAX_NUM_LENGTH)
  {
    $num = "0".$num;
  }
  return $num;
}
get_features_iteratordescriptionprevnextTop
sub get_features_iterator {
  my $self = shift;

  my ($search,$options,$callback) = @_;
  $callback || $self->throw('must provide a callback argument');
  $options->{temp_file} = $self->_temp_file;

  my $results = $self->_get_features_by_search_options($search,$options);
  return Bio::DB::GFF::Adaptor::berkeleydb::iterator->new($results,$callback,$options->{temp_file});
}

#--------------------------------------------------------------------------#
package FeatureStore; # This is a very specialized package that stores serialized features onto a file-based
# array. The array is indexed by the physical offset to the beginning of each serialized
# feature.
use strict; use Fcntl qw(SEEK_SET SEEK_END); use base 'Bio::Root::Root'; use Bio::DB::GFF::Adaptor::memory::feature_serializer; # qw(feature2string string2feature @hash2array_map);
}
General documentation
CONSTRUCTORTop
Use Bio::DB::GFF->new() to construct new instances of this class.
Three named arguments are recommended:
 Argument    Description
-------- -----------
-adaptor Set to "berkeleydb" to create an instance of this class. -dsn Path to directory where the database index files will be stored (alias -db) -autoindex Monitor the indicated directory path for FASTA and GFF files, and update the indexes automatically if they change (alias -dir) -write Set to a true value in order to update the database. -create Set to a true value to create the database the first time (implies -write) -tmp Location of temporary directory for storing intermediate files during certain queries. -preferred_groups Specify the grouping tag. See Bio::DB::GFF
The -dsn argument selects the directory in which to store the database
index files. If the directory does not exist it will be created
automatically, provided that the current process has sufficient
privileges. If no -dsn argument is specified, a database named "test"
will be created in your system's temporary files directory.
The -tmp argument specifies the temporary directory to use for storing
intermediate search results. If not specified, your system's temporary
files directory will be used. On Unix systems, the TMPDIR environment
variable is honored. Note that some queries can require a lot of
space.
The -autoindex argument, if present, selects a directory to be
monitored for GFF and FASTA files (which can be compressed with the
gzip program if desired). Whenever any file in this directory is
changed, the index files will be updated. Note that the indexing can
take a long time to run: anywhere from 5 to 10 minutes for a million
features. An alias for this argument is -dir, which gives this adaptor
a similar flavor to the "memory" adaptor.
-dsn and -dir can point to the same directory. If -dir is given but
-dsn is absent the index files will be stored into the directory
containing the source files. For autoindexing to work, you must
specify the same -dir path each time you open the database.
If you do not choose autoindexing, then you will want to load the
database using the bp_load_gff.pl command-line tool. For example:
 bp_load_gff.pl -a berkeleydb -c -d /usr/local/share/gff/dmel dna1.fa dna2.fa features.gff
BUGSTop
The various get_Stream_* methods and the features() method with the
-iterator argument only return an iterator after the query runs
completely and the module has been able to generate a temporary
results file on disk. This means that iteration is not as big a win as
it is for the relational-database adaptors.
Like the dbi::mysqlopt adaptor, this module uses a binning scheme to
speed up range-based searches. The binning scheme used here imposes a
hard-coded 1 gigabase (1000 Mbase) limit on the size of the largest
chromosome or other reference sequence.
SEE ALSOTop
Bio::DB::GFF, bioperl
AUTHORSTop
Vsevolod (Simon) Ilyushchenko >simonf@cshl.edu<
Lincoln Stein >lstein@cshl.edu<
Copyright (c) 2005 Cold Spring Harbor Laboratory.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.