How do we represent the relationship that we find in PSI-MI format files and other databases which contain information in the proteomics style of representation. Answer: Very carefully :).
Whenever we can we use the mechanisms we already have in BioPAX 1.
For molecular binding interactions we use complexAssembly/binding (see ../Binding representation, ../Binding site representation)
But there are still cases which don't fit. I've identified at least 3 cases of this.
In coIP experiments we may know nothing more than that the proteins in question were in the same clump.
Databases may not annotate enough information to know what the interaction is.
There may be relations that are outside what we can represent in BioPAX level 2, e.g. cleavage.
In the current proposal is it proposed that the majority of interactions in PSI-MI formatted files are to be represented as instances of physicalInteraction, with a slot INTERACTION-TYPE which has a controlled vocabulary term from PSI-MI. I think this is a bad idea for 2 reasons:
It's way to easy to translate the whole file into this format and not think about where things should really go.
It removes the possibility of representing generics by saying we have a generic whenever we make an instance of a class that has no subclasses.
Instead I create two classes to hold those interactions that can't be represented in some other class.
association:
This class represents a type of interaction commonly represented in the molecular interaction network style of representation. Typically, one uses this class or a subclass (e.g. aggregation for co-IP) to define an association of proteins and other physical entities where the details of the association are not known, or when there may be more information available about the association but the source of that information chooses not to represent it, as when a curator of a literature database chooses to represent only certain aspects of an experiment.
Where more information is known, use a more specific class. For example, if proteins are known to directly interact or if the record includes a binding site, create a complex assembly. Similarly, if the experimental method is crystallograpy. If the record defines an enzymatic reaction, use the appropriate subclass of catalysis controlling a stateChange interaction. The participants in an association should be listed in the PARTICIPANTS slot. Note that this is the only case in which the PARTICIPANT slot should be directly populated.
aggregation:
Use this class to represent an association determined by e.g., a coimmunoprecipitation, where you know that the proteins involved stuck to the same clump of proteins but you don't know for certain that they bind directly to each other, and that is all you know.
If coimmunoprecipitation is used as a binding assay, for example, to determine a binding site by mutation analysis, or where you can otherwise infer that the proteins directly interact, use a binding or complexAssembly to represent the interaction.
In order to allow for the possibility that these associations could in fact turn out to be bindings or other types of interactions, this class it not made disjoint with the other interaction classes, allowing one to later state the an association instance is owl:sameAs a binding instance.
Questions:
bader What's the difference between association and aggregation? It seems they both can handle coIP experimental results.
Alan Ruttenberg Aggregation is a subclass of association. You know more about an aggregation than an association, namely that the proteins are stuck together (possibly via intermediaries). If you don't know that much at least, then it is an association (though I expect it would be useful to further subclass association when other kinds of information are known, e.g. colocation would be another candidate subclass). I've edited the description of association to clarify that co-ip should be an aggregation instance.
bader This subclassing is modeling experiments in too fine grained detail and leads us down the slippery slope towards modeling all experimental details, a task which is basically impossible to complete. This is one reason why it is important to separate the details of the experiment from the interpretation/model (another is to maintain generality). We generally want to capture the interpretation of the experiment in the main BioPAX class structure. The experimental details backing up the interpretation may exist in a database and may be added via the 'EVIDENCE' property. Proteomics representation style (my name for how a number of DBs that are part of PSI-MI represent their data) has this clear separation even if each record is generally supported by only one experiment (which may be a point of confusion). As you get into more specialized classes in BioPAX, you need to combine more experiments to make the conclusion e.g. of a complex assembly or transport. (another point of confusion is that in metabolic databases, they don't store the evidence - this doesn't mean the experiments weren't done, it is because the experiments were often done to completion, so biologists are satisfied with the resulting model.) Thinking in this way, the class structure progresses naturally from less detailed at the root to more detailed at the leaves (in terms of interpretation).
bader - Oct.12.2005 - This page describes the start of an interesting research project, but is out of scope of our PSI-MI Conversion.