Table Admin

Gene Selection Procedure

a.Gene predictions from sequence analysis

Brief description of the gene selection process (by Tom Walk and Sue Rhee).

Of the 1577 enzymatic reactions in AraCyc (as of February, 2007), 418 are not associated with any enzymes. A series of approaches have been employed to link Arabidopsis gene products with these reactions:
1) Attributes of reactions lacking enzymes have been assembled. This information includes Enzyme Commission (EC) number, reactants, products and pathway context from AraCyc and MetaCyc.
2) From above information, a set of 228 EC numbers has been generated for searching the databases UniProt, Genbank, and BRENDA for sequences of proteins catalyzing these reactions in Arabidopsis or other organisms. These searches yielded 36954 sequences for 198 EC numbered reactions.
3) The sequence analysis tools BLAST and Hmmer were used to search for Arabidopsis sequences for 180 EC numbers with at least five associated protein sequences in other organisms. This search has yielded 1076 genes with products matched to 150 EC numbered reactions. Genes already annotated in AraCyc and those with experimentally verified evidence were filtered out of the analysis, bringing the total number of genes remaining in the analysis to 324 covering 79 EC numbered reactions.
4) Further review of the expression patterns of the target gene and expression patterns of paralogs in leaves, strength of matching on the basis of expectation values returned by BLAST and Hmmer, along with consideration of current gene ontology annotations were then conducted to rank potential target genes.
5) Only genes available in homozygous mutant lines from Salk were considered in the final analysis.
6) These genes were checked for pathway links and relation of target reactions to identifiable metabolites. This led us to the selection of the 8 genes listed in table.

b. Gene predictions from association network in collaboration with Insuk Lee and Ed Marcotte Link

The association network utilizes a number of genome features that allow for prediction of protein functions that have not been experimentally characterized. These features include sequence homology, coexpression with genes of known function, information mined from literature, and similarity of phylogeny with genes of known function.See Marcotte’s lab website for more details. The following is a brief description of the critiera used to generate the linked table. After making gene predictions for validated pathways , filtering and scoring methods similar to those used in sequence analysis were applied. Availability of homozygous mutant lines is checked. MCL cluster size and BLAST analysis of the Arabidopsis proteome are taken as indicators of gene family size. In addition, expression in leaf tissue is checked in a set of Affymetrix microarrays. Finally, Gene Ontology terms and evidence codes are checked for previously known functions and the level of supporting evidence. Only genes expressed in leaves of 2 week old plants, with little or no homologue expression, and no existing experimental evidence for molecular function are to be considered. Furthermore, as an objective is to find enzymes for reactions currently lacking any known catalysts, genes are removed from consideration if there is homology with an existing pathway gene.