The GO DAG file format is used by
It is a quick parsing file, containing the structure and IDs from the Gene Ontology and can be
automatically generated from the
Gene Ontology's OBO format.
A GO DAG file is packaged with the gomo databases avaliable on the meme website.
The file structure can be divided into 3 major parts.
The purpose of the header comments is to document the details of the OBO file that was used to generate it. Each line starts with a # symbol and the rest of the line can contain any content.
The purpose of the directed acyclic graph portion is to store the hierarchical structure of the Gene Ontology in a way that is quick to load into memory while still being compact in file size. The first line is the number of nodes in the DAG. Following that lines are in groups of 5 defining different attributes of the node. The group is on its own in the first line. The name is on the second line with the length coming first so memory can be preallocated for it. The node's position in the DAG is summarized in the third line with the total number of nodes above followed by the total number or nodes below. The fourth and fifth lines define the edges from the node to its parents and edges from the node to its children. Each line of edges starts with the edge count for that line which can be zero. Each edge is a number in the range zero to the node count minus 1 and is the index of the linked node. Values on the same line are tab seperated. The order that the nodes are listed does not have any meaning other than providing a position to for the other nodes to link to.
The purpose of the graph labels is to allow lookup of a graph node. Each line in the graph labels section has a symbol indicating if the label is the primary (>) or alternate (+) label, followed by a tab, followed by the label, followed by a tab, followed by the position of the associated node (index plus 1) or zero if the label is obsolete. The labels are ordered alphabetically.
Suppose there were the nodes A, B, C, D with the alternate names G,E,H,F and an obsolete name Z all from the grouping 'example' which I will shorten to x.
|Node Name||Alternate Names||Parent Nodes||Child Nodes||Group||Nodes Above||Nodes Below|
So one possible output for this example would be as follows. Note that this example simulates tabs as tabs don't display properly in html.
If for some reason you can't source the GO DAG file from the meme website then the tool obo2dag is provided in the scripts directory for the purpose of creating GO DAG files. It is an executable jar file with source packaged in the jar. As the OBO file format is still under active development we made use of a parser included with the OBO-Edit program which means our program is dependent on libraries from OBO-Edit. As this program is not likely to be needed by an end user we have not sought permission to include OBO-Edit's parser and so you will have to source the libraries yourself.
The tool obo2dag is dependent on the OBO-Edit classes:
These classes have their own dependancies and it was discovered through a process of trial and error that the libraries needed from the OBO-Edit distribution are:
The current distributions of OBO-Edit seem to only include automated installers (no tar version) and so I recommend downloading the RPM version and using a program like 7-zip to extract the files you need. I found that the jar files were in the rpm at the location "/./opt/OBO-Edit2/runtime/". Once you have the jar files simply run obo2dag in the same folder.
As stated previously, obo2dag is an executable jar file and so if your system is setup correctly you can
run it like you would a program and it will bring up the graphical user interface. To run the GUI from the
command line type:
java -jar obo2dag.jar
If you need to run obo2dag in non-GUI mode then you must specify the class
and pass it the path to the obo file and the path to the output GO DAG file. The command is:
java -cp obo2dag.jar gomo.DAGParser <GO OBO File> <Output File>