Evolutionary Informatics (EvoInfo) Working Group
In the fall of 2006, NESCent funded a working group with a mandate to promote interoperability, based on a proposal by Rutger Vos and Arlin Stoltzfus, submitted with the support of over a dozen scientific experts in phylogenetic software development (see Participants).
News and announcements
The EvoInfo working group recently completed its term of operation with the highly successful Database Interop Hackathon. The energy from the group largely was channeled into a new group called EvoIO, which focused on organizing
- a 2009 Phyloinformatics VoCamp
- an NSF data interoperability network proposal
- and the Hackathons, Interoperability, Phylogenies (HIP) working group.
The growth of bioinformatics and genomics presents a wealth of opportunities for expanded application of evolutionary methods-- expanded with respect to both the amount and the variety of analyses. Powerful tools for evolutionary analysis already exist, but integrating evolutionary methodology into biological data analysis does not depend so much on the power of tools as it does on infrastructure. To address these infrastrutural needs, the EvoInfo working group will develop community cohesion on issues of standards and interoperability, and will facilitate (directly and indirectly) development of interoperable software and data standards.
Activities and outputs
In principle, working groups can generate a variety of different outputs: original analyses; documents for purposes of education, information, and policy-making; standards and specifications; outreach efforts; databases and software.
The evolutionary informatics group is doing a bit of everything, but our main focus is on directly and indirectly contributing to the development of artefacts that promote interoperability. In particular, we have developed an integrated interoperability strategy based on three technologies:
- NexML, a XML-based data exchange standard for phylogenetics and comparative biology
- CDAO, a Comparative Data Analysis Ontology
- PhyloWS, a web services standard
Most of the work on these technologies has been done outside of the working group members outside of meetings. The working group provides a forum to coordinate activities within and between the nexml, CDAO and phyloWS enthusiasts. For instance, as an example of nexml-CDAO coordination, this year the working group instigated and coordinated discussions about how to integrate metadata references (using CDAO terms) into nexml using RDFa. Some of the other things the working group has done include
- documenting Use Cases and terminology in evolutionary analysis
- developing a Transition Model Language for evolutionary models used in statistical inference
- developing validation and interconversion software to support legacy formats
To find out more about these activities, follow the links in the list above. It may also be useful to read the 4 reports generated by the working group:
evolution, informatics and comparative biology
The term "evolutionary informatics" emphasizes the informatics aspects of evolutionary analysis, which is a kind of comparative analysis. Most inferences in genomics, for instance, come from comparative analysis: interpreting similarities and differences between different organisms, e.g., inferring the locations of genes and regulatory sites in the human genome by comparing it with the chimp and mouse genomes.
Evolutionary analysis (or "evolutionary comparative analysis") is comparative analysis undertaken within the framework provided by evolutionary theory. Unlike heuristic machine-learning approaches borrowed from computer science, evolutionary analysis treats the items to be compared as homologs that have evolved along paths of common descent (a tree) according to dynamics that reflect evolutionary genetics. Ideally this framework converts questions about interpreting similarities and differences into theoretically well posed questions about evolutionary transitions in the states of characters along the branches of a tree. Evolutionary comparative analysis has a long and colorful history, rooted in the early efforts of "numerical taxonomists" and "cladists" (working on the problems of organismal classification) to replace personal judgment with rigorous principles.
The informatics part of evolutionary informatics emphasizes technologies to address needs that emerge when sets of data to be analyzed become very large, diverse, or dispersed. When an individual user is curating and analyzing a small data set, it may suffice to store the data in a text file and to maintain a lab notebook describing the analysis. When dozens of individuals at different institutions are working together to annotate a genome, or to assemble the tree of life, we experience new needs for standards, automation, traceability, validation, and so on.
The main focus of the EvoInfo working group is to promote interoperability. An operational definition (see business process interoperability) of interoperability is that interoperability exists when objectives can be achieved automatically, using human labor or "mind work" only where absolutely essential (e.g., for tasks that cannot or should not be automated, like fighting fires or answering the phone).