Bio::OntologyIO
simplehierarchy
Toolbar
Summary
Bio::OntologyIO::simplehierarchy - a base class parser for simple hierarchy-by-indentation
type formats
Package variables
No package variables defined.
Included modules
Data::Dumper
File::Basename
Inherit
Synopsis
use Bio::OntologyIO;
# do not use directly -- use via Bio::OntologyIO
my $parser = Bio::OntologyIO->new
( -format => "simplehierarchy",
-file => "pathology_terms.csv",
-indent_string => ",",
-ontology_name => "eVOC",
-term_factory => $fact,
);
my $ontology = $parser->next_ontology();
Description
Needs Graph.pm from CPAN. This class is nearly identical to
OntologyIO::dagflat, see
Bio::OntologyIO::dagflat for details.
Methods
Methods description
Title : ontology_name Usage : $obj->ontology_name($newval) Function: Get/set the name of the ontology parsed by this module. Example : Returns : value of ontology_name (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : parse() Usage : $parser->parse(); Function: Parses the files set with "new" or with methods defs_file and _flat_files.
Normally you should not need to call this method as it will
be called automatically upon the first call to
next_ontology().
Returns : [Bio::Ontology::OntologyEngineI]
Args : |
Title : next_ontology Usage : Function: Get the next available ontology from the parser. This is the method prescribed by Bio::OntologyIO. Example : Returns : An object implementing Bio::Ontology::OntologyI, and undef if there is no more ontology in the input. Args : |
Title : _flat_files Usage : $files_to_parse = $parser->_flat_files(); Function: Get the array of ontology flat files that need to be parsed.
Note that this array will decrease in elements over the
parsing process. Therefore, it\'s value outside of this
module will be limited. Also, be careful not to alter the
array unless you know what you are doing.
Returns : a reference to an array of zero or more strings
Args : none |
Title : _defs_io Usage : $obj->_defs_io($newval) Function: Get/set the Bio::Root::IO instance representing the definition file, if provided (see defs_file()). Example : Returns : value of _defs_io (a Bio::Root::IO object) Args : on set, new value (a Bio::Root::IO object or undef, optional) |
Title : indent_string Usage : $obj->indent_string($newval) Function: Example : Returns : value of indent_string (a scalar) Args : on set, new value (a scalar or undef, optional) |
Title : file_is_root Usage : $obj->file_is_root($newval) Function: Boolean indicating whether a virtual root term is to be added, the name of which will be derived from the file name.
Enabling this allows one to parse multiple input files into the
same ontology and still have separately rooted.
Example :
Returns : value of file_is_root (a scalar)
Args : on set, new value (a scalar or undef, optional) |
Title : _virtual_root Usage : $obj->_virtual_root($newval) Function: Example : Returns : value of _virtual_root (a scalar) Args : on set, new value (a scalar or undef, optional) |
Methods code
sub _initialize
{ my ($self, @args) = @_;
$self->SUPER::_initialize( @args );
my ( $indent,$files,$fileisroot,$name,$eng ) =
$self->_rearrange([qw(INDENT_STRING
FILES
FILE_IS_ROOT
ONTOLOGY_NAME
ENGINE)
], @args);
$self->_done( FALSE );
$self->_not_first_record( FALSE );
$self->_term( "" );
$self->file_is_root($fileisroot) if defined($fileisroot);
$indent = ' ' unless defined($indent); if (($indent =~ /\\/) && ($indent !~ /[\$\`]/)) {
$indent = "\$indent =\" $indent\"";
eval $indent;
}
$self->indent_string($indent);
delete $self->{'_ontologies'};
$eng = Bio::Ontology::OBOEngine->new() unless $eng;
if($eng->isa("Bio::Ontology::OntologyI")) {
$self->ontology_name($eng->name());
$eng = $eng->engine() if $eng->can('engine');
}
$self->_ont_engine($eng);
$self->{_flat_files} = $files ? ref($files) ? $files : [$files] : [];
$self->ontology_name($name) if $name;
}
} |
sub ontology_name
{ my $self = shift;
return $self->{'ontology_name'} = shift if @_;
return $self->{'ontology_name'};} |
sub parse
{ my $self = shift;
$self->term_factory(Bio::Ontology::TermFactory->new(
-type => "Bio::Ontology::Term"))
unless $self->term_factory();
my $ont = Bio::Ontology::Ontology->new(-name => $self->ontology_name(),
-engine => $self->_ont_engine());
foreach ($self->_part_of_relationship(),
$self->_is_a_relationship(),
$self->_related_to_relationship()) {
$_->ontology($ont);
}
if(! $self->_fh) {
$self->_initialize_io(-file => shift(@{$self->_flat_files()}));
}
while($self->_fh) {
$self->_parse_flat_file($ont);
if(@{$self->_flat_files()}) {
$self->close();
$self->_virtual_root(undef);
$self->_initialize_io(-file => shift(@{$self->_flat_files()}));
} else {
last; }
}
$self->_add_ontology($ont);
return $self->_ont_engine();
}
} |
sub next_ontology
{ my $self = shift;
$self->parse() unless exists($self->{'_ontologies'});
return shift(@{$self->{'_ontologies'}}) if exists($self->{'_ontologies'});
return;} |
sub _flat_files
{ my $self = shift;
$self->{_flat_files} = [] unless exists($self->{_flat_files});
return $self->{_flat_files};
}
} |
sub _defs_io
{ my $self = shift;
return $self->{'_defs_io'} = shift if @_;
return $self->{'_defs_io'};} |
sub _add_ontology
{ my $self = shift;
$self->{'_ontologies'} = [] unless exists($self->{'_ontologies'});
foreach my $ont (@_) {
$self->throw(ref($ont)." does not implement Bio::Ontology::OntologyI")
unless ref($ont) && $ont->isa("Bio::Ontology::OntologyI");
push(@{$self->{'_ontologies'}}, $ont);
}
}
} |
sub _add_term
{ my ( $self, $term, $ont ) = @_;
$term->ontology($ont) if $ont && (! $term->ontology);
$self->_ont_engine()->add_term( $term );
}
} |
sub _part_of_relationship
{ my ( $self, $term ) = @_;
return $self->_ont_engine()->part_of_relationship();
}
} |
sub _is_a_relationship
{ my ( $self, $term ) = @_;
return $self->_ont_engine()->is_a_relationship();
}
} |
sub _related_to_relationship
{ my ( $self, $term ) = @_;
return $self->_ont_engine()->related_to_relationship();
}
} |
sub _add_relationship
{ my ( $self, $parent, $child, $type, $ont ) = @_;
$self->_ont_engine()->add_relationship( $child, $type, $parent, $ont );
}
} |
sub _has_term
{ my ( $self, $term ) = @_;
$term = $self->ontology_name() .'|'. $term
unless ref($term) || !$self->ontology_name();
return $self->_ont_engine()->has_term( $term );
}
} |
sub _get_terms
{ my $self = shift;
my @args = ();
while(@_) {
unshift(@args, pop(@_)); $args[0] = $self->ontology_name() .'|'. $args[0]
unless ref($args[0]) || !$self->ontology_name();
}
return $self->_ont_engine->get_terms(@args);
}
} |
sub _parse_flat_file
{ my $self = shift;
my $ont = shift;
my @stack = ();
my $prev_indent = -1;
my $parent = "";
my $prev_term = "";
my $indent_string = $self->indent_string;
while ( my $line = $self->_readline() ) {
if ( $line =~ /^[$indent_string]*[\|\-]/ ) { next;
}
my ($current_term) = $line =~ /^[$indent_string]*(.*)/;
my $current_indent = $self->_count_indents( $line );
chomp $current_term;
$current_term =~ s/[$indent_string]+$//;
$current_term =~ s/^\"(.*)\"$/$1/;
my $syn = $current_term =~ s/\s+{([^}]+)}// ? $1 : undef;
if ( ! $self->_has_term( $current_term ) ) {
my $term = $self->_create_ont_entry($current_term);
$term->add_synonym(split(/[;,]\s*/,$syn)) if $syn;
$self->_add_term( $term, $ont );
if($current_indent == 0) {
if($self->_virtual_root()) {
$self->_add_relationship($self->_virtual_root(),
$term,
$self->_is_a_relationship(),
$ont);
}
$prev_indent = $current_indent;
$prev_term = $current_term;
push @stack, $current_term;
next;
}
}
if ( $current_indent != $prev_indent ) {
if ( $current_indent == $prev_indent + 1 ) {
push( @stack, $prev_term );
} elsif ( $current_indent < $prev_indent ) {
my $n = $prev_indent - $current_indent;
for ( my $i = 0; $i < $n; ++$i ) {
pop( @stack );
}
} else {
$self->throw("format error: indentation level $current_indent "
."is more than one higher than the previous "
."level $prev_indent ('$current_term', "
."file ".$self->file.")" );
}
}
$parent = $stack[-1];
if($parent ne $current_term) { $self->_add_relationship($self->_get_terms($parent),
$self->_get_terms($current_term),
$self->_is_a_relationship(),
$ont);
}
$prev_indent = $current_indent;
$prev_term = $current_term;
}
return $ont;
}
} |
sub _get_first_termid
{ my ( $self, $line ) = @_;
if ( $line =~ /;\s*([A-Z]{1,8}:\d{7})/ ) {
return $1;
}
else {
$self->throw( "format error: no term id in line\" $line\"" );
}
}
} |
sub _count_indents
{ my ( $self, $line ) = @_;
my $indent = $self->indent_string;
if ( $line =~ /^($indent+)/ ) {
return (length($1)/length($indent)); }
else {
return 0;
}
}
} |
sub _ont_engine
{ my ( $self, $value ) = @_;
if ( defined $value ) {
$self->{ "_ont_engine" } = $value;
}
return $self->{ "_ont_engine" };
}
} |
sub _create_ont_entry
{ my ( $self, $name, $termid ) = @_;
my $term = $self->term_factory->create_object(-name => $name,
-identifier => $termid);
return $term;
}
} |
sub _not_first_record
{ my ( $self, $value ) = @_;
if ( defined $value ) {
unless ( $value == FALSE || $value == TRUE ) {
$self->throw( "Argument to method\" _not_first_record\" must be either ".TRUE." or ".FALSE );
}
$self->{ "_not_first_record" } = $value;
}
return $self->{ "_not_first_record" };
}
} |
sub _done
{ my ( $self, $value ) = @_;
if ( defined $value ) {
unless ( $value == FALSE || $value == TRUE ) {
$self->throw( "Found [$value] where [" . TRUE
." or " . FALSE . "] expected" );
}
$self->{ "_done" } = $value;
}
return $self->{ "_done" };
}
} |
sub _term
{ my ( $self, $value ) = @_;
if ( defined $value ) {
$self->{ "_term" } = $value;
}
return $self->{ "_term" };
}
} |
sub indent_string
{ my $self = shift;
return $self->{'indent_string'} = shift if @_;
return $self->{'indent_string'};} |
sub file_is_root
{ my $self = shift;
return $self->{'file_is_root'} = shift if @_;
return $self->{'file_is_root'};} |
sub _virtual_root
{ my $self = shift;
return $self->{'_virtual_root'} = shift if @_;
return unless $self->file_is_root() && $self->file();
if(! $self->{'_virtual_root'}) {
my ($rt,undef,undef) = fileparse($self->file(), '\..*');
$rt =~ s/_/ /g;
$rt = $self->_create_ont_entry($rt);
$self->_add_term($rt, $self->ontology_name());
$self->{'_virtual_root'} = $rt;
}
return $self->{'_virtual_root'};
}
1;} |
General documentation
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to the
Bioperl mailing lists Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Please direct usage questions or support issues to the mailing list:
bioperl-l@bioperl.org
rather than to the module maintainer directly. Many experienced and
reponsive experts will be able look at the problem and quickly
address it. Please include a thorough description of the problem
with code and data examples if at all possible.
Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via the
web:
https://redmine.open-bio.org/projects/bioperl/
Christian Zmasek
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _
Title : new
Usage : see SYNOPSIS
Function: Creates a new simplehierarchy parser.
Returns : A new simplehierarchy parser object, implementing Bio::OntologyIO.
Args : -files => a single ontology flat file holding the
term relationships, or an array ref holding
the file names
-file => if there is only a single flat file, it may
also be specified via the -file parameter
-ontology_name => the name of the ontology, defaults to
"Gene Ontology"
-file_is_root => Boolean indicating whether a virtual root
term is to be added, the name of which will
be derived from the file name. Default is false.
Enabling this allows one to parse multiple input
files into the same ontology and still have
separately rooted.
-engine => the Bio::Ontology::OntologyEngineI object
to be reused (will be created otherwise); note
that every Bio::Ontology::OntologyI will
qualify as well since that one inherits from the
former.
-indent_string => the string used to indent hierarchical
levels in the file.
For a file like this:
term0
subterm1A
subterm2A
subterm1B
subterm1C
indent_string would be " ". Defaults to
one space (" ").
-comment_char => Allows specification of a regular
expression string to indicate a comment line.
Currently defaults to "[\|\-]".
Note: this is not yet implemented.
See
Bio::OntologyIO.