Published in: Science, vol. 302, no. 5643, pp. 249-255 (October 10, 2003).
Originally published in Science Express as 10.1126/science.1087447 on August 21, 2003
http://www.sciencemag.org/cgi/content/full/302/5643/249


"A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules".

Joshua M. Stuart 1, e, Eran Segal 2, e, Daphne Koller 2, *, Stuart K. Kim 3, *

1 Stanford Medical Informatics, 251 Campus Drive, MSOB X-215, Stanford, CA 94305-5329, USA.
2 Computer Science Department, Gates Building 1A, Stanford University, Stanford, CA 94305-9010, USA.
3 Departments of Developmental Biology and Genetics, Stanford University School of Medicine, Stanford, CA 94305-5329, USA.
e These authors contributed equally to this work.
* To whom correspondence should be addressed.
E-mail:   koller@cs.stanford.edu
              kim@cmgm.stanford.edu



Abstract:
Introduction:
Table 1. Network Components:
Expression within Neoplasms:
References:
Additional References for Active RNA:
Other Links:
Information:

Abstract:

To elucidate gene function on a global scale, we identified pairs of genes that are coexpressed over 3182 DNA microarrays from humans, flies, worms, and yeast. We found 22,163 such coexpression relationships, each of which has been conserved across evolution. This conservation implies that the coexpression of these gene pairs confers a selective advantage and therefore that these genes are functionally related. Many of these relationships provide strong evidence for the involvement of new genes in core biological functions such as the cell cycle, secretion, and protein expression. We experimentally confirmed the predictions implied by some of these links and identified cell proliferation functions for several genes. We assembled these links into a gene coexpression network consisting of 12 large components, and found several that were animal-specific as well
as interrelationships between newly evolved and ancient modules.



Introduction:

The genome sequences of humans and several model organisms have established a nearly complete list of the genes required to enact cellular, developmental, and behavioral processes in these organisms (1–4). The next major challenges are to elucidate the functions of the large fraction of genes in the genome whose functions are currently unknown and to discover how the genes interact to perform specific biological processes. DNA microarrays provide us with a first step toward the goal of uncovering gene function on a global scale. Because genes that encode proteins that participate in the same pathway or are part of the same protein complex are often coregulated, clusters of genes with related functions often exhibit expression patterns that are correlated under a large number of diverse conditions in DNA microarray experiments (5–8).

However, coregulation does not necessarily imply that genes are functionally related. For example, cis-regulatory DNA motifs are predicted to occur by chance in the genome and might lead to serendipitous transcriptional regulation of nearby genes. In experiments limited to a single species, it would be difficult or even impossible to distinguish accidentally regulated genes from those that are physiologically important. However, evolutionary conservation is a powerful criterion to identify genes that are functionally important from a set of coregulated genes. Coregulation of a pair of genes over large evolutionary distances implies that the coregulation confers a selective advantage, most likely because the genes are functionally related. Because small and subtle changes in fitness can confer selective advantage during evolution, the test for related gene function using evolutionary conservation in the wild is more sensitive than scoring the phenotype resulting from strong loss-of-function mutants in the laboratory.

The recent availability of large sets of DNA microarray data for humans, flies, worms, and yeast makes it possible to measure evolutionarily conserved coexpression on a genomewide scale (9–11). We developed a computational method to analyze 3182 DNA microarrays from humans, flies, worms, and yeast (most of which were previously published) to identify gene interactions that are evolutionarily conserved.



...
Table 1. Network components.
Component Size a Biological Function b Genes in
Component c
Enrichment P value d
1 353 Cellular cortex 16/57 2.7 10-6. 1
. . Signaling 44/321 1.3 10-5. 8
. . Animal-specific 195/1441 1.3 10-7. 2
2 349 Ribosome biogenesis 102/125 8.0 10-83
3 320 Energy generation 77/147 5.6 10-42
4 271 Proteasome 31/32 12.0 10-32
5 241 Cell cycle 110/202 7.7 10-85
6 201 General transcription 47/142 5.6 10-24
7 167 Animal-specific 124/1441 1.8 10-17
8 156 Translation initiation,
elongation, and termination
20/110 4.0 10-7. 3
. . Aminoacyl transfer RNA biosynthesis 14/31 9.9 10-11
9 139 Ribosomal protein subunits 74/78 23.0 10-107
10 92 Secretion 37/85 16.0 10-38
11 65 Neuronal 17/42 21.0 10-19
. . Animal-specific 58/1441 2.1 10-15
12 57 Lipid metabolism 6/16 22.0 10-7
. . Peroxisome 14/32 26.0 10-17

a The total number of metagenes in the component.

b Biological functions were based on edited terms from Gene Ontology (15) and the KEGG database (22).

c The number of metagenes in the biological function group and in the component divided by the total number of metagenes in the biological function group that were also in the network.

d The ratio between the number of observed metagenes in a category and the number expected by chance. The P value was computed as the probability of obtaining the observed number of overlaps by chance under a hypergeometric distribution.



...

If a gene is linked in the network to many genes that participate in the same biological process, it is reasonable to hypothesize that it also participates in that process. We experimentally validated some of the gene functions that were predicted by the multiple-species network. We selected five metagenes that showed conserved coexpression with genes known to be involved in cell proliferation and the cell cycle but that were not previously known to be involved in these processes. Specifically, we chose MEG1503 (which encodes an snRNP protein involved in splicing), MEG342 (which encodes a nucleoporin-interacting component), and three other metagenes (MEG4513, MEG1192, and MEG1146) that encode previously unknown proteins of unknown function (table S1). All five of these metagenes showed a significant number of links in the coexpression networks to known cell proliferation genes (table S3).

We first tested the expression levels of these genes in dividing pancreatic cancer cells and in nondividing normal cells, using recently published data from Iacobuzio-Donahue et al. (21). (These data were not used to construct the gene-coexpression network.) Figure 4A shows that all five (meta)genes are overexpressed in human pancreatic cancers relative to normal tissue, to the same extent as (meta)genes known to be involved in cell proliferation.

Fig. 4. (A) MEG1503, MEG342, MEG4513, MEG1192, and MEG1146 are overexpressed in pancreatic cancers.

We plotted the metagenes with the GeneXPress program (http://genexpress.stanford.edu) using data from (21). The first five columns correspond to expression data obtained from normal pancreas specimens (pSF2779N, pSF442N, pSF4N, pSF5NT, and pSF768NT), and the remaining eight columns correspond to expression data obtained from pancreatic cancer specimens [a pancreatic cancer cell line (HS766T), five Hopkins/Goggins pancreatic cancer cell cultures (PL2, PL22, PL21, PL1, and PL8), a poorly differentiated pancreas carcinoma (pSF439T), and a pancreas foamy cell adenocarcinoma specimen (pSF1T)]. Each row corresponds to the expression profile of a single metagene across the 13 pancreatic samples. Bold indicates metagenes with unknown functions that are implicated in cell proliferation by the network. Neighbors of each implicated metagene that were previously known to be involved in cell proliferation or cell cycle are also shown. Scale shows log 2 expression ratio.



...
References:

1.  A. Goffeau et al., Science 274, 546 (1996).

2. E. W. Myers et al., Science 287, 2196 (2000).

3. E. S. Lander et al., Nature 409, 860 (2001).

4. J. C. Venter et al., Science 291, 1304 (2001).

5. M. B. Eisen, P. T. Spellman, P. O. Brown, D. Botstein, Proc. Natl. Acad. Sci. U.S.A. 95, 14863 (1998).

6. T. R. Hughes et al., Cell 102, 109 (2000).

7. S. K. Kim et al., Science 293, 2087 (2001).

8. E. Segal et al., Nature Genet. (2003).

9. O. Alter, P. O. Brown, D. Botstein, Proc. Natl. Acad. Sci. U.S.A. 100, 3351 (2003).

10. V. van Noort, B. Snel, M. A. Huynen, Trends Genet. 19, 238 (2003).

11. S. A. Teichmann, M. M. Babu, Trends Biotechnol. 20, 407 (2002).

12. R. L. Tatusov, M. Y. Galperin, D. A. Natale, E. V. Koonin, Nucleic Acids Res. 28, 33 (2000).

13. Y. Lee et al., Genome Res. 12, 493 (2002).

14. Materials and methods are available as supporting material on Science Online.

15. M. Ashbruner et al., Nature Genet. 25, 25 (2000).
...

21. C. A. Iacobuzio-Donahue et al., Am. J. Path. 162, 1151 (2003).

22. H. Ogata et al., Nucleic Acids Res. 27, 29 (1999).
...



Additional References for Active RNA:

1. Frenster JH, and Hovsepian JA, "RNA Feedback Mechanisms during Eukaryotic Gene Regulation".

2. Frenster JH, "Nuclear RNA Species Activate DNA Transcription within Chromatin".

3. Gottesfeld JM, and Barbas CF III, "RNA as a Transcriptional Activator".

4. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription".

5. Saha S, Ansari AZ, Jarell KA, and Ptashne M, "RNA Sequences that Work as Transcriptional Activating Regions".

6. DeCarvalho S, "Effect of RNA from Normal Human Marrow on Leukaemic Marrow In-Vivo".

7. Frenster JH, "Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".
 



Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events -

For Further Information or Feedback:
e-mail:   frenster@euchromatin.net
Phone:   +1 650 367 6483
Fax:   +1 650 364 1773

euchromatin:  "the most active portion of the genome within the cell nucleus".