Joshua M. Stuart 1, e, Eran Segal 2, e, Daphne Koller 2, *, Stuart K. Kim 3, *
1 Stanford Medical Informatics, 251 Campus Drive, MSOB
X-215, Stanford, CA 94305-5329, USA.
2 Computer Science Department, Gates Building 1A, Stanford
University, Stanford, CA 94305-9010, USA.
3 Departments of Developmental Biology and Genetics,
Stanford University School of Medicine, Stanford, CA 94305-5329, USA.
e These authors contributed equally to this work.
* To whom correspondence should be addressed.
E-mail: koller@cs.stanford.edu
kim@cmgm.stanford.edu
To elucidate gene function on a global scale, we identified pairs
of genes that are coexpressed over 3182 DNA microarrays from humans,
flies, worms, and yeast. We found 22,163 such coexpression relationships,
each of which has been conserved across evolution. This conservation implies
that the coexpression of these gene pairs confers a selective advantage
and therefore that these genes are functionally related. Many of these
relationships provide strong evidence for the involvement of new genes
in core biological functions such as the cell cycle, secretion, and protein
expression. We experimentally confirmed the predictions implied by some
of these links and identified cell proliferation functions for several
genes. We assembled these links into a gene coexpression network
consisting of 12 large components, and found several that were animal-specific
as well
as interrelationships between newly evolved and ancient modules.
The genome sequences of humans and several model organisms have established a nearly complete list of the genes required to enact cellular, developmental, and behavioral processes in these organisms (1–4). The next major challenges are to elucidate the functions of the large fraction of genes in the genome whose functions are currently unknown and to discover how the genes interact to perform specific biological processes. DNA microarrays provide us with a first step toward the goal of uncovering gene function on a global scale. Because genes that encode proteins that participate in the same pathway or are part of the same protein complex are often coregulated, clusters of genes with related functions often exhibit expression patterns that are correlated under a large number of diverse conditions in DNA microarray experiments (5–8).
However, coregulation does not necessarily imply that genes are functionally related. For example, cis-regulatory DNA motifs are predicted to occur by chance in the genome and might lead to serendipitous transcriptional regulation of nearby genes. In experiments limited to a single species, it would be difficult or even impossible to distinguish accidentally regulated genes from those that are physiologically important. However, evolutionary conservation is a powerful criterion to identify genes that are functionally important from a set of coregulated genes. Coregulation of a pair of genes over large evolutionary distances implies that the coregulation confers a selective advantage, most likely because the genes are functionally related. Because small and subtle changes in fitness can confer selective advantage during evolution, the test for related gene function using evolutionary conservation in the wild is more sensitive than scoring the phenotype resulting from strong loss-of-function mutants in the laboratory.
The recent availability of large sets of DNA microarray data for humans, flies, worms, and yeast makes it possible to measure evolutionarily conserved coexpression on a genomewide scale (9–11). We developed a computational method to analyze 3182 DNA microarrays from humans, flies, worms, and yeast (most of which were previously published) to identify gene interactions that are evolutionarily conserved.
| Component | Size a | Biological Function b | Genes in
Component c |
Enrichment | P value d |
| 1 | 353 | Cellular cortex | 16/57 | 2.7 | 10-6. 1 |
| . | . | Signaling | 44/321 | 1.3 | 10-5. 8 |
| . | . | Animal-specific | 195/1441 | 1.3 | 10-7. 2 |
| 2 | 349 | Ribosome biogenesis | 102/125 | 8.0 | 10-83 |
| 3 | 320 | Energy generation | 77/147 | 5.6 | 10-42 |
| 4 | 271 | Proteasome | 31/32 | 12.0 | 10-32 |
| 5 | 241 | Cell cycle | 110/202 | 7.7 | 10-85 |
| 6 | 201 | General transcription | 47/142 | 5.6 | 10-24 |
| 7 | 167 | Animal-specific | 124/1441 | 1.8 | 10-17 |
| 8 | 156 | Translation initiation,
elongation, and termination |
20/110 | 4.0 | 10-7. 3 |
| . | . | Aminoacyl transfer RNA biosynthesis | 14/31 | 9.9 | 10-11 |
| 9 | 139 | Ribosomal protein subunits | 74/78 | 23.0 | 10-107 |
| 10 | 92 | Secretion | 37/85 | 16.0 | 10-38 |
| 11 | 65 | Neuronal | 17/42 | 21.0 | 10-19 |
| . | . | Animal-specific | 58/1441 | 2.1 | 10-15 |
| 12 | 57 | Lipid metabolism | 6/16 | 22.0 | 10-7 |
| . | . | Peroxisome | 14/32 | 26.0 | 10-17 |
a The total number of metagenes in the component.
b Biological functions were based on edited terms from Gene Ontology (15) and the KEGG database (22).
c The number of metagenes in the biological function group and in the component divided by the total number of metagenes in the biological function group that were also in the network.
d The ratio between the number of observed metagenes in a category and the number expected by chance. The P value was computed as the probability of obtaining the observed number of overlaps by chance under a hypergeometric distribution.
If a gene is linked in the network to many genes that participate in the same biological process, it is reasonable to hypothesize that it also participates in that process. We experimentally validated some of the gene functions that were predicted by the multiple-species network. We selected five metagenes that showed conserved coexpression with genes known to be involved in cell proliferation and the cell cycle but that were not previously known to be involved in these processes. Specifically, we chose MEG1503 (which encodes an snRNP protein involved in splicing), MEG342 (which encodes a nucleoporin-interacting component), and three other metagenes (MEG4513, MEG1192, and MEG1146) that encode previously unknown proteins of unknown function (table S1). All five of these metagenes showed a significant number of links in the coexpression networks to known cell proliferation genes (table S3).
We first tested the expression levels of these genes in dividing
pancreatic cancer cells and in nondividing normal cells, using
recently published data from Iacobuzio-Donahue et al. (21).
(These data were not used to construct the gene-coexpression network.)
Figure
4A shows that all five (meta)genes are overexpressed in human
pancreatic cancers relative to normal tissue, to the same extent as (meta)genes
known to be involved in cell proliferation.
Fig. 4. (A) MEG1503, MEG342, MEG4513, MEG1192, and MEG1146 are overexpressed in pancreatic cancers.
We plotted the metagenes with the GeneXPress program (http://genexpress.stanford.edu) using data from (21). The first five columns correspond to expression data obtained from normal pancreas specimens (pSF2779N, pSF442N, pSF4N, pSF5NT, and pSF768NT), and the remaining eight columns correspond to expression data obtained from pancreatic cancer specimens [a pancreatic cancer cell line (HS766T), five Hopkins/Goggins pancreatic cancer cell cultures (PL2, PL22, PL21, PL1, and PL8), a poorly differentiated pancreas carcinoma (pSF439T), and a pancreas foamy cell adenocarcinoma specimen (pSF1T)]. Each row corresponds to the expression profile of a single metagene across the 13 pancreatic samples. Bold indicates metagenes with unknown functions that are implicated in cell proliferation by the network. Neighbors of each implicated metagene that were previously known to be involved in cell proliferation or cell cycle are also shown. Scale shows log 2 expression ratio.
1. A. Goffeau et al., Science 274, 546 (1996).
2. E. W. Myers et al., Science 287, 2196 (2000).
3. E. S. Lander et al., Nature 409, 860 (2001).
4. J. C. Venter et al., Science 291, 1304 (2001).
5. M. B. Eisen, P. T. Spellman, P. O. Brown, D. Botstein, Proc. Natl. Acad. Sci. U.S.A. 95, 14863 (1998).
6. T. R. Hughes et al., Cell 102, 109 (2000).
7. S. K. Kim et al., Science 293, 2087 (2001).
8. E. Segal et al., Nature Genet. (2003).
9. O. Alter, P. O. Brown, D. Botstein, Proc. Natl. Acad. Sci. U.S.A. 100, 3351 (2003).
10. V. van Noort, B. Snel, M. A. Huynen, Trends Genet. 19, 238 (2003).
11. S. A. Teichmann, M. M. Babu, Trends Biotechnol. 20, 407 (2002).
12. R. L. Tatusov, M. Y. Galperin, D. A. Natale, E. V. Koonin, Nucleic Acids Res. 28, 33 (2000).
13. Y. Lee et al., Genome Res. 12, 493 (2002).
14. Materials and methods are available as supporting material on Science Online.
15. M. Ashbruner et al., Nature Genet. 25, 25 (2000).
...
21. C. A. Iacobuzio-Donahue et al., Am. J. Path. 162, 1151 (2003).
22. H. Ogata et al., Nucleic Acids Res. 27, 29
(1999).
...
1. Frenster JH, and Hovsepian JA, "RNA Feedback Mechanisms during Eukaryotic Gene Regulation".
2. Frenster JH, "Nuclear RNA Species Activate DNA Transcription within Chromatin".
3. Gottesfeld JM, and Barbas CF III, "RNA as a Transcriptional Activator".
4. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription".
5. Saha S, Ansari AZ, Jarell KA, and Ptashne M, "RNA Sequences that Work as Transcriptional Activating Regions".
6. DeCarvalho S, "Effect of RNA from Normal Human Marrow on Leukaemic Marrow In-Vivo".
7. Frenster JH, "Ultrastructural Probes of Active
DNA Sites, and the RNA Activators of DNA".