BioPAX Workshop October 15-16, 2007 (Monday, Tuesday), SRI, Menlo Park, CA
This meeting was held to mainly try to finalize the representation of gene regulation and other aspects of BioPAX Level 3. Only active database groups, mainly those interested in gene regulation, states and generics, and Bay Area locals interested in BioPAX were invited. The full meeting agenda, copies of most presentations and notes are now available on this page.
Agenda
Day 1 - Monday, Oct.15
8:45am - light breakfast
9am - Set up, intro, logistics (Gary, Paul)
Agenda overview
Meals - vegetarian option
Review of BioPAX implementations and success stories (Gary)
Acknowledgment of NIH BioPAX workshop grant.
BioPAX Level 3
9:20am - BioPAX Level release candidate 3 overview (Emek, Gary)
States, generics, genetic interactions and partial coverage of gene regulation
Current BioPAX Level 3 available at
http://www.biopax.org/release/biopax-level3.owl
10:30am - Coffee break
11am - Gene regulation session. Chair: Gary Bader.
Goal: reach consensus on gene regulation representation in BioPAX Level 3
The current pre-release version of BioPAX Level 3 has a bare bones gene regulation implementation, which is the result of previous discussions on the topic. Ideally we could spend enough time to flesh this out further and adopt something suitable for release in Level 3 in the short term.
All gene regulation stakeholders present their data model and show how it could map to BioPAX Level 3 - 20 minutes each + questions
BioCyc (Peter Karp or Suzanne Paley)
INOH (Ken Fukuda)
PANTHER (Paul Thomas)
RegulonDB (Irma Martinez-Flores and Monica Penaloza-Spinola) - Incorporating_regulation_and_DNA-parts_in_BioPax_2.ppt
TRANSFAC (Burk Braun) - BurkBioPax1.ppt
1pm - Box lunch catered
2pm - Gene regulation discussion
Working towards consensus on gene regulation representation in BioPAX Level 3.
Discussion of open issues from the morning session. E.g.
Definition of a gene (does it include the promoter?). Should we forget about gene definition and just model using DNA regions and 'transcriptional units', leaving the concept of gene for genetic interactions?
General modeling of both prokaryotic and eukaryotic gene expression?
How detailed to we want to get in terms of modeling different types of DNA regions involved in transcription? Can we just use the sequence feature ontology (SO)?
Coverage of microRNA based translational regulation?
Coverage of splicing processes in gene expression, including generation of many small RNAs from larger transcript?
Summary session - 15 minutes
Ensure all points of discussion are captured.
3:30pm - Coffee break
4:00pm - further discussion of gene regulation
6pm - End of Day 1
7pm - Dinner
Day 2 - Tuesday, Oct.16
8:45am - light breakfast + coffee
Semantic Integration and data exchange
9:00am - Semantic integration discussion. Chair: Ken Fukuda.
Different databases model the same biological processes in different ways. BioPAX is a common data exchange format, but people can still use it in different ways. The community needs to share experiences and best practices to help use BioPAX in standard ways and recommend community best practices.
MIMIx as an example of a community best practice - Gary Bader - 5 minutes - Highlight of
MIMIx paper PathwayCommons research resource - Gary Bader - 5 minutes - 2007SRI-PathwayCommons.ppt
[merge and diff] Operations on pathways from different resources - Nigam Shah - 5 minutes - BioPAX@SRI.pdf
Biowarehouse - Peter Karp - 5 minutes
Annotation ontologies - Ken Fukuda - 5 minutes
Possible discussion items:
validation - What validation rules can be checked by a validator tool to help data providers verify they are making semantically correct BioPAX available that also follows best practices? Brainstorm discussion to collect rules.
merge tools + pilot projects - what experience do people have with merging BioPAX data from different sources?
challenges - what are the main challenges that people see with semantic integration of pathways from different sources.
9:45am - Data exchange discussion. Chair: Paul Thomas.
The primary goal of the BioPAX workgroup is to create a standard a data exchange format for pathway information, from databases to users, from databases to other databases and from users to other users.
Carl Schaefer - importing/exporting BioPAX to/from PID - 10 minutes - schaefer_16_Oct_2007.pdf
Guanming Wu - batch-import tool for BioPAX, Panther - 5 minutes - ReactomeBatchImport.ppt
Example discussion topics:
What are people's experiences, problems with and plans for real data exchange?
What recent data has been converted to BioPAX?
What software is required for effective data exchange?
What exchange protocols are needed?
Software
10:30am - break
10:45am - Pathway databases, data types and data sets - new developments. Chair: Peter Karp
What do groups need in terms of format, software?
PharmGKB - 10 minutes - Michelle Carrillo
WikiPathways - 10 minutes - Alex Pico - Walk through of
http://wikipathways.org Developments in BioCyc and Pathway Tools - 10 minutes - Peter Karp
11:30am - Visualization. Chair: Emek Demir.
SBGN - Huaiyu Mi - 10 minutes
11:50am - Curation/collection/author entry tools. Chair: Ken Fukuda
What tools are available to help authors and curators enter their pathway data e.g. as BioPAX format for submission to pathway databases?
PANTHER curator tools - Paul Thomas - 15 minutes
iHOP/Cashew prize - Chris Sander - 5 minutes
Reactome curator and author tool - Guanming Wu - 10 minutes
INOH curation - Ken Fukuda - 5 minutes
Curator tools - session to share experiences with curator tools, think about future of curator tools
Data acquisition/text mining
Future directions
12:30pm - Future BioPAX roadmap. Chair: Chris Sander.
What should be added in future BioPAX levels (brainstorm and prioritize)
Relationship b/w pathways and "models" of biological processes - Nigam Shah - 5 minutes
A proposal for modeling cell objects by layers - Irma Martinez-Flores, Monica Penaloza-Spinola - 5 minutes
1:00am - lunch/discussion - end of main workshop
Tuesday afternoon technical workshop
2:00pm - Progress on PaxTools. Chair: Emek Demir
Applications in mind while developing it, issues and what people would like from it.
Future features
This will be an open workshop where groups and experienced BioPAX developers can interact and work on specific BioPAX relevant projects. Examples of projects to work on include:
Validation rules - implementation session
Converters - go over actual mappings between new databases and BioPAX, plan how to support level 3 with individual database groups who already support level 1 or 2
Namespace discussion - we should have more consistency in the BioPAX namespace to better support OWL query tools. Andrea Splendiani will present a proposal. NoteBiopaxMeeting.pdf
Demo session for BioPAX relevant software - participants sign up for short demos
Cytoscape import of BioPAX and visual representation
6pm - end of workshop
Participants
|
Participant |
Organization |
|
Carl Schaefer |
NCI/Nature PID, US NCI |
|
Shiva Krupa |
NCI/Nature PID, Nature, Boston |
|
Emek Demir |
Pathway Commons, MSKCC |
|
Gary Bader |
Pathway Commons, Cytoscape, U. Toronto |
|
Chris Sander |
Pathway Commons, Cytoscape, MSKCC |
|
Paul Thomas |
PANTHER, SRI |
|
Huaiyu Mi |
PANTHER, SRI |
|
Nan Guo |
PANTHER, SRI |
|
Peter Karp |
BioCyc, SRI |
|
Suzanne Paley |
BioCyc, SRI |
|
Guanming Wu |
Reactome, CSHL |
|
Nigam Shah |
NCBO, Stanford |
|
Li Gong |
PharmGKB, Stanford |
|
Michelle Carrillo |
PharmGKB, Stanford |
|
Ryan Whaley |
PharmGKB, Stanford |
|
Monica Penaloza-Spinola |
RegulonDB, UNAM |
|
Irma Martinez-Flores |
RegulonDB, UNAM |
|
Julio Collado-Vides |
RegulonDB, UNAM |
|
Burk Braun |
TRANSFAC, Biobase |
|
Ken Fukuda |
INOH, CBRC, AIST |
|
Alex Pico |
GenMAPP, Cytoscape, UCSF |
|
Andrea Splendiani |
Bootsrep project, Université de Rennes |
Logistic information about meeting is at
http://www.biopax.org/sri2007/
Photos
Meeting report/notes
Day 1
Morning
Emek presented the states and generic proposal. This was well received and the following issues were raised:
Currently referenceEntity is a utility class. Should it be moved to be an entity? (Emek)
Why have a referenceComplex? There is no database of complexes, so why have it? (Guanming + Paul) Answer: This allows both the complex and subunits to have state variables
How do you deal with stoichiometry of physical entities, since stoichiometry is currently only in the referenceComplex? (Peter)
nonCovalentFeature change name to something similar to “non covalent binding site” (Paul)
Not-modified-at: default is “don’t know” (Nigam, Peter) – ok, but needs to be clearer in documentation
Gary presented the gene regulation proposal, which required much discussion during the first day of the meeting. General questions posed prior to the meeting were:
3 vs. 2 class system - Template based reaction implementation?
How detailed to we want to get in terms of modeling different types of DNA regions involved in transcription? Can we just use the sequence feature ontology (SO)?
Use of a DNA binding region
Gene vs. dna – use of gene in gene regulation?
Position of Gene class?
General modeling of both prokaryotic and eukaryotic gene expression?
Coverage of microRNA based translational regulation?
Coverage of splicing processes in gene expression, including generation of many small RNAs from larger transcript?
Gary presented the genetic interaction proposal. Issues raised were:
Be ready for HTP expts – so include quantitative information about interactions, allow values for phenotypic change, expectation model e.g. Bliss additivity (Chris). However, recording observations goes over the line of recording experiments (Carl) – only do as needed.
Are SNP-SNP interactions covered? (Alex, PharmGKB - Michelle). Answer: They could be, since dbSNP references can be used and should be ok for most tasks
Other new features of BioPAX Level 3
openControlledVocabulary class is now subclassed to specific types, from original suggestion by Imre Vastrik. This is good, but specific CV’s should be required, unless none are available.
pathwayStep has been subclassed to biochemicalPathwayStep to differentiate the two types of pathway steps. Pathways are ordered in the PATHWAY-ORDER property of pathway which reference pathwaySteps. Question: How to define reversible in pathwayStep (Suzanne)
physicalInteraction was changed to be a sibling of control and conversion interaction classes. There was a request from Ken Fukuda to rename physicalInteraction to molecularInteraction.
Afternoon - gene regulation
The database groups most interested in gene regulation presented their gene regulation data models and compared these to the BioPAX proposal.
BioCyc
Broaden representation to all prokaryotic gene regulation (EcoCyc grant to do this)
New regulatory overview view in EcoCyc
No transcription/translation for economy reasons
No gene, use of DNA fragment
INOH
Gene is a DNA fragment
separate representation of transcription and translation.
all transcription is activating, repression is captured by the ‘inhibition’ eventRelationship –same with feedback
PANTHER pathway
Gene regulation in CellDesigner – gene regulation – very high level geneA activates or inhibits geneB.
No real gene regulation model in CellDesigner.
RegulonDB
TranscriptionUnit – a segment of DNA with one or more genes – promoter, genes, terminator
binding site – all objects have an ID
activation or repression – only 2 effects
transcription factor, effector (metabolite), conformation, regulatory interaction, regulon
Issues around gene regulation were thoroughly discussed and a new 2-class design for gene regulation was developed.
The result of the discussion was to create the following two classes:
templateReaction
template: RNA, DNA
product: dna, rna, protein
regulatory-element: e.g. promoter, RBS
templateReactionRegulation
Controller: complex, protein, small molecule
Controlled: templateReaction
Open issue: we need to make it easier to name types of DNA molecules. One proposal from RegulonDB group was to create a DNA fragment subclass of DNA. Another option would be to use the DNA class for this, but make it easier to specify the type of DNA.
While gene regulation was discussed, Protein Degradation was also discussed and a design for addition of this concept to BioPAX was developed.
Day 2
Semantic integration and data exchange discussion
Best practices will need to be defined and followed to enable integration and data exchange. We recommend:
Use of uniprot or refseq IDs for all proteins, where available.
Use of standard DB names, UniProt, not uniprot. Follow PSI CV database names
Standard use of GO IDs and other IDs when referencing other ontologies and databases.
Use URIs as RDF IDs if available. We should create a wiki page listing available RDF ID sources.
BioPAX future work
Support generalizations over interactions e.g. EC reactions
Add protein cleavage – may require a protein fragment as an object
Standardize use of small molecules: BioCyc compounds (xlinks to CAS, KEGG)
Support for pathway layout
Pathway abstraction – can accept input and output. Model pathway as a general process with input and output.
Future, when needed: Cell-cell communication – other issues with pathways in multicellular organisms
More best practices
Increased connection with SBGN graphical representation community.
Possible ideas for next F2F meeting in 2008
Curator get together, specific task: cross-reference compounds
Software: tools – pathway logic (Talcott, Lincoln), FBA (distribute models in BioPAX) – hackathon to interface with modeling communities – different modeling representations.
Motivated by biological questions – scientific discoveries (session for next year’s meeting – specific uses of tools to illustrate)
Standard web services – similar to the PSI-MI standard web service initiative.
Organized as SIGs – e.g. tool development, web services API, best practices (ways of encoding) – could have parallel sessions
Plan for finalizing BioPAX Level 3
Goal: next release candidate, including documentation, worked examples and paxtools support by end of 2007
Gene regulation issues will be resolved by follow up discussion and creation of worked examples.
Emek will update the states and generics proposal with feedback from this meeting.
Genetic interactions will be discussed further with Chris Sander to capture additional requirements and Gary will update the ontology.
If necessary, a full day work session or 2 half day work sessions will be scheduled for early spring to finalize.
Post-meeting comments
Feel free to constructively comment on this meeting in this section.