Semantic API for CDAO Subgroup
Goal & Objectives
The main goal for this group during the hackathon is to create a basic API for CDAO using semantic framework. We would focus on the phylogenetic tree initially and build a system that should be extendable to other types of evolutionary data.
The specific objectives are
- Create a NEXML to CDAO converter script using XSLT
- Create test script to validate the CDAO file
- Implement the XSLT script in a data source ( Brandon's project) so that it can be accessed through web
Google Code Repository
- Source Browse link
- SVN checkout command - svn co https://dbhack1.googlecode.com/svn/trunk/cdao-api
- Get your copy of the command-line
nexvl.pl-- they're going fast.
- Get the Batch Transformations Script:
- Get the
NeXMLto CDAO xslt converter:
- Get the
- Get the
NeXMLto taxon concept converter:
- Get the Java example code for loading
NeXMLdirect to a triple store (does not use the preprocessing script - porting that to Java is an exercise left to the reader):
- Get the Java example code for comparing model generated from
NeXMLwith existing model for testing (this was a dumb approach as the blank node ids prevent the comparison working but code gives useful examples of how to handle the models and conversions):
There are a number of sample NeXML files here, that "mostly validate" except for recent dict element changes.
Chat About OWL
Outcome of the work is that we have two ways of converting NeXML into triples (RDF graphs). One of the conversions binds the triples to the CDAO ontology i.e. it creates objects from NeXML that are CDAO objects. For example CDAO has the notion of an Edge and NeXML has the notion of an edge so instances of CDAO Edges are created to represent the instances of edges in the NeXML file that is parsed.
The other conversion is experimental. It translates the NeXML file in terms of the TDWG TaxonConcept vocabulary. Trees, tree nodes and OTUs are treated as OWL classes subclassing TaxonConcept or each other. i.e. this is a very different approach but maybe useful in comparing trees in NeXML files with existing, synonymised taxonomies. It is slightly easier to make inferences across this subclass hierarchy than the object graph created by the CDAO conversion.
- To transform using
xsltprocuse the following command:
$ xsltproc nexml2cdao.xsl nexml-instance-document.xml > cdao-instance-document.rdf
- Common Problems
the number of characters defined in the
<format> block. Sequences longer than the number of declared characters
will be truncated to the number of characters that have been declared. The
expand_cells.pl script can be run as follows.
$ ./expand_cells.pl < nexml_with_sequences.xml > nexml_with_cells.xml.
Alternatively as a filter
$ cat nexml_with_sequences.xml | ./expand_cells.pl > nexml_with_cells.xml.
do_transforms.pl script expects the following arguments:
--xslt-path. In addition the following optional arguments may be supplied:
--input-dirspecifies the path to a directory where the NeXML documents are stored.
--output-dirspecifies the path to a directory where the CDAO documents will be saved.
--xslt-pathspecifies the path to a directory where the stylesheet is located.
--xslt-processorallows the user to specify an alternate XSLT processor. The default is xsltproc
--xslt-scriptallows the user to specify an alternate stylesheet. The default is
The script applies the stylesheet to each of the NeXML files in the input directory and saves them as CDAO files in the output directory with the following naming convention. Suppose the NeXML file is named
nexml-data.xml then the transformed file will be names
What is the difference between reasoning and querying?
A simple query only matches triple patterns syntactically but does not compute asserted relations such as equivalent classes.
For example say that according to some ontology the class
Foo is asserted to be equivalent to some other class
Bar. When querying
for individuals of type
Foo in data containing some
Foo individuals and some
Bar individuals the result set would only contain the
type individuals. However when using a reasoner to interpret the previous query with the previous hypothetical data-set the result set
would contain both the
Foo and the
What if the type of character I want to describe is not defined by CDAO?
CDAO contains place-holder classes, such as
Standard, for standard characters, to which you can attach additional classes that have been defined externally. To accomplish this define your class, and then declare it to be a sub class of the desired parent class in CDAO. The relationships defined for that CDAO class can now automatically be applied to your new class by reasoners that encounter individuals of its type. For example suppose that you have a standard character
Foo that is not defined in CDAO. Create the class
Foo. When creating it, assert that
Foo is a
cdao:Standard character. Finally, annotate your data marking which characters are of type
Foo to create instances of the class. Reasoners processing these instances will be able to treat them as they would other
Note: Our xslt stylesheet will automatically generate some of these characters based on the state definitions provided in a matrix's
What if I need some other kind of term not described by CDAO?
Please use our term suggestion form to make your suggestion.
- Bio::Phylo API methods for Tree (Note: this page has changed! --Rvos@interchange.ubc.ca 01:23, 11 March 2009 (EDT))
- PhyloWS Bootcamp Notes
- RDF Visulalization: RDF Gravity allows one to load and flexibly view rdf graphs.
- OWL Editing: Protege is an integrated development environment for working with ontologies.
- RDF Query Tool: Twinkle a simple graphical environment for running SparQL queries.
- OWL Reasoner: Pellet is an open-source OWL reasoner.
- RDF Framework: Jena is an open-source Java frame-work for working with semantic web data.
- OWL Documentation Generator OwlDoc is a JavaDoc style documentation generator for OWL.
Arlin's Notes from Monday
- extracting NeXML semantics in triple stores
Notes by Dave.
- Roger's working on the style sheet to produce RDF from NeXML.
- Can then load it into RDFGravity.
- Get as much in as we need for test cases. Focusing on Trees, may get to character
- Matt Y is working on test documents
- Vivek looking at API for questions we want to be able to answer.
- Brandon and Enrico are working on XSLT transform. Almost done
- Problem was sequence element in NeXML is just a string.
- Preprocess it with Perl to break it into fully expanded thing that XSLT can deal with.
- Matt Y working on NeXML parser in Ruby.
- Vivek making good progress on generating API on top of OWL
- We were talking about RDF bucket.
- Wrote basic loader to load XML
- Mapping NeXML to TDWG taxon something or other.
- If get this done, both TDWG and NeXML can be in same triple store.