|
Proposal Name : ReleaseTools |
Proposal Type : Improvement |
Editor(s): DanCorwin |
|
Input from: AlanRuttenberg |
||
- Executive Summary
- Introduction
- Biological Questions / Use Cases
- Requirements
- Proposed Implementation
- Worked Examples
- Open Issues
- Expected growth and plan for growth
- Relation to external ontologies and plan for use
- OWL considerations
- Backward Compatibility
- Notes
Executive Summary
This proposes a "core" ontology for future BioPAX.org releases, focused on improved top-level concepts, support of best-practice Semantic Web software R&D; plus established paradigms for scientific attribution; end-user training; validated DB conversions, and interoperability. Its construction would be mediated by the CorePhysicalOntologyGuide, in both its nature and the R&D methodolgy used.
Special features include integrated web utilities to assist data suppliers; support for OWL-DL reasoners; and linkage to upper ontologies. Around this imported general core, an open suite of modular extension ontologies could arise and evolve, in which DX workers, external suppliers, data consumers and curators might all publish new models for exchanging triples on pathways. Standards too could become more modular, as BioPAX.org collected end-user feedback.
Introduction
Goals/Motivation
2.0 was released late, with bugs and caveats. The push to develop DX releases is also late by 3+ months, and it has not yet added clear plans to improve BioPAX standards in OWL-DL, R&D processes, documentation quality, or interoperability with other major ontologies. The loss of BioPAX's credibility as a standards publisher, even within our membership, is a growing threat.
To counter such problems and dangers, I propose a new R&D approach to post-2.0 releases - the layering of release modules around a "core" ontology within which software engineering and data exchange, not biochemistry, takes center stage. Relative to 2.0, "core" will address
more formal concept documentation, based on external upper-ontology classes
improved attribution standards, especially for every authored "abstract model"
modularity, so future extensions can arise more quickly as parallel work efforts
validating tools which ease the bulk-import of triples useful to OWL-DL reasoners
new standards (see Exec Summary) for the nature of models and revisions to them
A first goal is a 2.1 "core" released in roughly the same time as a DX module, similar to 2.0, but generalized and formalized in its handling of biochemical details of PARTICIPANTS, so that what remains provides a clear, formal base upon which the community can erect BioPAX 3.0
Biological Questions / Use Cases
Biologically speaking, "core" 2.1 will address pathway models in general, with minimal restrictions placed on PARTICIPANTS beyond solid documentation and attribution. Extensions can then add these restrictions, if, as and when their authors deem that desirable for particular needs.
What will "core" remove relative to 2.0? - Basically, all "utility classes" and other imaginary or conflated concepts. DX releases can import them if required from BioPAX 2.0. To minimize confusion, every extension to "core" will be expected to use a separate name space, then import all external concepts it requires from available sources.
What will "core" add relative to 2.0? - Whatever makes sense for its goals as a base, and can be defined in a reasonable number of months. New models will tap EXTERNAL engineering process, upper-ontology, and attribution paradigms, seeking a solid base for disciplined releases that support or exceed the original goals for 3.0.
Primary use cases for "core" 2.1 are restoring confidence in BioPAX.org's ability to deliver on the software goals that it publicly set for itself; re-establishing ontological quality as the top working group value; speeding validated data conversions; and experimentally discovering how 3.0 goals can best be advanced by using all available tools, paradigms and standards for industrial information exchange and professional software release processes.
Requirements
User Requirements
See "BioPAX-discuss" thread on "Some Requirements", starting May 25, 2006
Continued expansion of this email thread and related spinoffs is encouraged
(Others will be added after on-going analysis by "core" participants)
Practical Software Requirements
Commonly used upper ontology classes and properties will be integrated
Design patterns from other metadata-exchange efforts will be supported
The standards imposed on "core" ontology models will include OWL-DL
(Others will be added after on-going analysis by "core" participants)
Proposed Implementation
Core 2.1 will itself be assembled from separate, integrated modules
Core concepts will be documented using Self-Annotating_Identifers
Model-attribution standards will let Dublin Core metadata be located
Models will conform to principles in the CorePhysicalOntologyGuide
Worked Examples
Explanations of these are posted in the wiki - see OntologyDesignPatterns.
The basic patterns for triple exchange-formats include associations:
Ways to restrict extensions are exemplified for phosphorylation
A web utility agent for DB conversion is exemplified for interactions
Open Issues
Timing and content of initial "core" releases (TBDL by the SW group)
Acceptable exchange file formats (OWL, RDF, XTM, CSV,. etc.).
Expected growth and plan for growth
Biological extensions from many parties can be layered on top of "core"
Each new layer should take an incremental step toward a 3.0 class tree
BioPAX.org will solicit submissions and feedback from expert sources
Periodic 2.x standard "core" upgrades can arise under group goverance
Relation to external ontologies and plan for use
Interoperability use cases benefit from shared upper-ontology concepts
User-built ontologies can also be widely built as extensions to "core"
Exchange files based on open combinations of both are encouraged
Relative to 2.0, this should "open up" BioPAX use cases considerably
It should considerably boost compatibility, with GO most especially
OWL considerations
OWL tools continue to operate only in early release stages
"Core" R&D will offer suggested ways to address this problem..
Design Patterns (see Worked Examples) are a central method
Backward Compatibility
2.0-compatible "core" extensions should become possible in Q3
If usage guides were followed, few or no 2.0 instances should need changes
Proposed DX releases suggest very different (incompatible) standards
Notes
Please use "discuss" email, not wiki pages, to debate the above so that dates, authors, and opinions go into a more clearly persistent group record.
Discussion below can summarize conclusions.
Discuss this proposal (see note 1).