Gisela Storz
Cell Biology and Metabolism Branch, National Institute of Child Health
and Human Development, National Institutes of Health, Bethesda, MD 20892,
USA.
E-mail: storz@helix.nih.gov
Noncoding RNAs (ncRNAs) have been found to have roles in a great variety of processes, including transcriptional regulation, chromosome replication, RNA processing and modification, messenger RNA stability and translation, and even protein degradation and translocation. Recent studies indicate that ncRNAs are far more abundant and important than initially imagined. These findings raise several fundamental questions: How many ncRNAs are encoded by a genome? Given the absence of a diagnostic open reading frame, how can these genes be identified? How can all the functions of ncRNAs be elucidated?
Over the years, a number of RNAs that do not function as messenger RNAs (mRNAs), transfer RNAs (tRNAs), or ribosomal RNAs (rRNAs) have been discovered, mostly fortuitously. The non-mRNAs have been given a variety of names (1, 2); the term small RNAs (sRNAs) has been predominant in bacteria, whereas the term noncoding RNAs (ncRNAs) has been predominant in eukaryotes and will be used here. ncRNAs range in size from 21 to 25 nt for the large family of microRNAs (miRNAs) that modulate development in Caenorhabditis elegans, Drosophila, and mammals (3-8), up to ~100 to 200 nt for sRNAs commonly found as translational regulators in bacterial cells (9, 10) and to >10,000 nt for RNAs involved in gene silencing in higher eukaryotes (11-13). The functions described for ncRNAs thus far are extremely varied (Table 1).
Table 1. Processes affected by ncRNAs.
| Process | Example | Function | Reference |
| Transcription | 184-nt E. coli 6S | Modulates promoter use | (9, 14) |
| 331-nt human 7SK | Inhibits transcription elongation factor
P-TEFb |
(15, 16, 46) | |
| 875-nt human SRA | Steroid receptor coactivator | (12, 17) | |
| Gene silencing | 16,500-nt human Xist | Required for X-chromosome inactivation | (12, 13) |
| ~100,000-nt human Air | Required for autosomal gene imprinting | (11) | |
| Replication | 451-nt human
telomerase RNA |
Core of telomerase and telomere template | (18, 46) |
| RNA processing | 377-nt E. coli RNase P | Catalytic core of RNase P | (9, 19) |
| 186-nt human U2 snRNA | Core of spliceosome | (20, 46) | |
| RNA modification | 102-nt S. cerevisiae
CD snoRNA |
Directs 2'-O-ribose methylation of target
rRNA |
(21, 47) |
| 189-nt S. cerevisiae
snR8 H/ACA sno RNA |
Directs pseudouridylation of target rRNA | (21, 47) | |
| 68-nt T. brucei gCYb gRNA | Directs the insertion and excision of uridines | (23, 24, 48) | |
| RNA stability | 80-nt E. coli RyhB sRNA | Targets mRNAs for degradation? | (27) |
| Eukaryotic miRNA? | Targets mRNAs for degradation? | (7, 8) | |
| mRNA translation | 109-nt E. coli OxyS | Represses translation by occluding ribosome
binding |
(9, 10) |
| 87-nt E. coli DsrA sRNA | Activates translation by preventing formation
of an inhibitory mRNA structure |
(9, 10) | |
| 22-nt C. elegens lin-4
miRNA |
Represses translation by pairing with 3' end
of target mRNA |
(7, 8) | |
| Protein stability | 363-nt E. coli tmRNA | Directs addition of tag to peptides on stalled
ribosomes |
(9, 28) |
| Protein translocation | 114-nt E. coli 4.5S RNA | Integral component of signal recognition
particle central to protein translocation across membranes |
(9, 29) |
Some ncRNAs affect transcription and chromosome structure. The Escherichia coli 6S RNA binds to the bacterial s70 holoenzyme and modulates promoter use (14), and the human 7SK RNA binds and inhibits the transcription elongation factor P-TEFb (15, 16). Another human ncRNA, SRA RNA, was identified as interacting with progestin steroid hormone receptor and may serve as a coactivator of transcription (17). Several extremely long ncRNAs detected in insect and mammalian cells have been implicated in silencing genes and changing chromatin structure across large chromosomal regions (11-13). Examples include the human Xist RNA required for X chromosome inactivation and mouse Air RNA required for autosomal gene imprinting. The Xist RNA is produced by the inactive X chromosome and spreads in cis along the chromosome (13). The chromosome-associated RNA has been proposed to recruit proteins that affect chromatin structure; however, much remains to be learned about the mechanism by which Xist and other long ncRNAs establish and/or maintain gene silencing. Another eukaryote-specific RNA that is required for proper chromosome replication and structure is the telomerase RNA. This ncRNA is an integral part of the telomerase enzyme and serves as the template for the synthesis of the chromosome ends (18).
ncRNAs play roles in RNA processing and modification. The catalytic ribonuclease P (RNase P) RNA, found in organisms from all kingdoms, is responsible for processing the 5' end of precursor tRNAs and some rRNAs (19). In eukaryotes, small nuclear RNAs (snRNAs) are central to splicing of pre-mRNAs (20), and small nucleolar RNAs (snoRNAs) direct the 2'-O-ribose methylation (C/D-box type) and pseudouridylation (H/ACA-box type) of rRNA, tRNA, and ncRNAs by forming base pairs with sequences near the sites to be modified (21). Homologs of the two classes of snoRNAs have been found in archaea (22); however, counterparts have not yet been identified in bacteria, even though the rRNAs are modified. The less ubiquitous guide RNAs (gRNAs) present in kinetoplasts direct the insertion or deletion of uridine residues into mRNA (RNA editing) by mechanisms that involve base-pairing as well (23, 24).
ncRNAs also regulate mRNA stability and translation. The first discovered miRNAs, C. elegans lin-4 and let-7, repress translation by forming base pairs with the 3' end of target mRNAs (7, 8). Many of the recently identified miRNAs are likely to act in a similar fashion. However, it is conceivable that some members of this large family target mRNAs for degradation, as is the case for the similarly sized small interfering RNAs (siRNAs) that are processed and amplified from exogenously added, double-stranded RNA and lead to gene suppression in a process termed RNA interference (25, 26). As yet there is no evidence for miRNAs in bacteria, archaea, or fungi, but it might be fruitful to search for RNAs of <25 nt in these organisms. Several ncRNAs have been found to regulate translation and possibly mRNA stability in E. coli (9, 10, 27). These sRNAs form base pairs at various positions with their target mRNAs, and they have been shown to repress translation by occluding the ribosome binding site and to activate translation by preventing the formation of inhibitory mRNA structures.
Finally, ncRNAs affect protein stability and transport. One unique bacterial sRNA is recognized as both a tRNA and an mRNA by stalled ribosomes (tmRNA) (28). Alanylated tmRNA is delivered to the A site of a stalled ribosome; the nascent polypeptide is transferred to the alanine-charged tRNA portion of tmRNA. The problematic transcript then is replaced by the mRNA portion of tmRNA, which encodes a tag for degradation of the stalled peptide. It is not yet clear whether there is a counterpart to this coding RNA in archaeal and eukaryotic cells. In contrast, a small cytoplasmic RNA that forms the core of the signal recognition particle (SRP) required for protein translocation across membranes is found in organisms from all kingdoms (29).
The mechanisms of action for the characterized ncRNAs can be grouped into several general categories (Fig. 1). There are ncRNAs where base-pairing (often <10 base pairs and discontinuous) with another RNA or DNA molecule is central to function. The snoRNAs that direct RNA modification, the bacterial RNAs that modulate translation by forming base pairs with specific target mRNAs, and probably most of the miRNAs are examples of this category. Some ncRNAs mimic the structures of other nucleic acids; the 6S RNA structure is reminiscent of an open bacterial promoter, and the tmRNA has features of both tRNAs and mRNAs. Other ncRNAs, such as the RNase P RNA, have catalytic functions. Although synthetic RNAs have been selected to have a variety of biochemical functions, the number of natural ncRNAs shown to have catalytic function is limited. Most, if not all, ncRNAs are associated with proteins that augment their functions; however, some ncRNAs, such as the snRNAs and the SRP RNA, serve key structural roles in RNA-protein complexes. Several ncRNAs fit into more than one mechanistic category; the telomerase RNA provides the base-pairing template for telomere synthesis and is an integral part of the telomerase ribonucleoprotein complex. The mechanisms of action for a number of ncRNAs (such as the 7SK RNA) are not known, and it is probable that some ncRNAs act in ways that have not yet been established. Some investigators have suggested that many ncRNAs are vestiges of a world in which RNA carried out all of the functions in a primitive cell. However, given the versatility of RNA and the fact that the properties of RNA provide advantages over peptides for some mechanisms, it is likely that a number of ncRNAs have evolved more recently (30, 31).
(A) Direct base-pairing with target RNA or DNA molecules is central to the function of some ncRNAs: Eukaryotic snoRNAs direct nucleotide modifications (green star) by forming base pairs with flanking sequences, and the E. coli OxyS RNA represses translation by forming base pairs with the Shine-Dalgarno sequence (green box) and occluding ribosome binding.
(B) Some ncRNAs mimic the structure of other nucleic acids: Bacterial RNA polymerase may recognize the 6S RNA as an open promoter, and bacterial ribosomes recognize tmRNA as both a tRNA and an mRNA.
(C) ncRNAs also can function as an integral part of a larger RNA-protein complex, such as the signal recognition particle, whose structure has been partially determined (49).
The first ncRNAs were identified in the 1960s on the basis of their high expression; these RNAs were detected by direct labeling and separation on polyacrylamide gels. Others were later found by subfractionation of nuclear extracts or by association with specific proteins. A few were identified by mutations or phenotypes resulting from overexpression. The serendipitous discoveries of many of these ncRNAs were the first glimpses of their existence, but this work did not presage the vast numbers that appear to be encoded by a genome.
Several systematic
searches for ncRNA genes have been carried out in the past 4 years. Among
the computation-based searches, there have been screens of the yeast Saccharomyces
cerevisiae and archaeal Pyrococcus genomes for the short conserved
motifs present in snoRNAs (32, 33). In other searches,
the intergenic regions of S. cerevisiae, E. coli, Methanococcus
jannaschii, and Pyrococcus furiosus chromosomes have been scanned
for properties indicative of an ncRNA gene. Criteria for identifying candidate
intergenic regions have included large gaps between protein-coding genes
(34), extended stretches of conservation between species
with the same gene order (35, 36), orphan promoter or
terminator sequences (34, 36, 37),
presence of GC-rich regions in an organisms with a high AT content (38),
and conserved RNA secondary structures (39, 40). Other
searches for ncRNAs have involved large-scale cloning efforts that have
taken into
account specific ncRNA properties. In studies of mouse (41,
42) and the archaeon Archaeoglobus fulgidus (22),
total RNA between 50 to 500 nt was isolated, and arrays of cDNA clones
obtained from the RNA were screened with oligonucleotides corresponding
to the most abundant known RNAs. Clones showing the lowest hybridization
signal then were randomly sequenced. In recent screens for C. elegans,
Drosophila,
and human miRNAs, RNA molecules of less than 30 nt were isolated, and cDNA
clones were generated upon the ligation of primers to the 5' and 3' ends
of the RNA (3, 4) or upon RNA tailing (5).
Other miRNAs were isolated and cloned on the basis of their association
with a complex composed of the human Gemin3, Gemin4, and IF2C proteins
(6). In most studies, Northern blots have been carried
out to confirm that the cloned genes are expressed as small transcripts.
These blots also have provided information about spatial and temporal expression
patterns as
well as potential precursor and degradation products.
Despite the success
of the recent systematic efforts, it is certain that not all ncRNAs have
been detected. Estimates for the number of sRNAs in E. coli range
from 50 to 200 (1, 35), and estimates
for the number of miRNAs in C. elegans range from hundreds to thousands
(7). There also are many non-protein-coding regions of
the bacterial and eukaryotic chromosomes for which transcription is detected
(43, 44), but it is not known how many of these regions
encode defined, functional ncRNAs. Extensions of the various systematic
searches should lead to the identification of more ncRNAs. However, limitations
of the current approaches should be noted. Most of the computation methods
have focused on the intergenic regions. It has already been shown that
some of the ncRNAs are processed from longer protein- or rRNA-encoding
transcripts (42). It also is quite
possible that ncRNAs are expressed from
the opposite strand of protein-coding genes. On the other hand, expression-based
methods may miss ncRNAs that are synthesized under very defined conditions,
such as in response to a specific environmental signal, during a specific
stage in development, or in a specific cell type. Much attention has been
focused on characterizing the "proteome" of a sequenced organism. The recent
discovery of hundreds of new ncRNAs illustrates that the "RNome" also will
need to be characterized before a complete tally of the number of genes
encoded by a genome can be achieved.
What Are All the Functions of ncRNAs?
An astonishing variety of ncRNA functions have already been found, but there are many ncRNAs for which the cellular roles are still unknown. For instance, Y RNAs, small cytoplasmic RNAs associated with the Ro autoantigen in several different organisms, are still enigmatic even after many years of study (45). With the more systematic identification of increasing numbers of ncRNAs, the question of how to elucidate the functions of all ncRNAs is becoming more and more prominent.
Approaches that have succeeded previously
are an obvious place to start in answering the question of function, but
it is likely that new approaches also will need to be developed. For genetically
tractable organisms, ncRNA knockout or overexpression strains can be screened
for differences in phenotypes (such as viability) or whole- genome expression
patterns. The functions of several ncRNAs were identified by the biochemical
identification of associated proteins, and the development of more systematic
methods for characterizing ncRNA-associated proteins should be fruitful.
As the knowledge base of what sequences are critical for the formation
of specific structures or for base-pairing expands, and as computer programs
for predicting structures improve, computational approaches should become
an increasingly important avenue for elucidating the
functions of ncRNAs. The three-dimensional structures of only a
limited number of RNAs and RNA-protein complexes have been solved. An increase
in the structural database may bring to light recognizable RNA or RNA-protein
domains associated with specific functions.
Information about when ncRNAs are expressed and where ncRNAs are localized is useful for all experiments aimed at probing function. Many of the C. elegans miRNAs are synthesized only at very specific times in development, and thus they have also been called small temporal RNAs (stRNAs). Among the snoRNAs, some are expressed exclusively in the brain (41), and one of the bacterial sRNAs is only detected upon oxidative stress (9, 10). It is likely that other ncRNAs will be found to have very defined expression and localization patterns and that these will be critical to function.
There are many more ncRNAs than was ever suspected. A big challenge for the future will be to identify the whole complement of ncRNAs and to elucidate their functions. This is an exciting time for investigators whose work has focused on ncRNAs. However, scientists studying all aspects of biology should keep ncRNAs in mind. The phenotypes associated with specific mutations may be due to defects in a ncRNA instead of being due to defects in a protein, as is usually expected. Investigators developing purification schemes for specific proteins or activities should be aware of the possible presence of an ncRNA component; many purification procedures are designed to remove nucleic acids. There may be ncRNAs lurking behind many an unexplained phenomenon.
1. S. R. Eddy, Nature Rev. Genet. 2, 919 (2001).
2. Non-mRNAs have been denoted ncRNA = noncoding RNA, snmRNA = small non-mRNA, sRNA = small RNA, fRNA = functional RNA, and oRNA = other RNA, and it is likely that the nomenclature of these RNAs will need to be revisited.
3. M. Lagos-Quintana, R. Rauhut, W. Lendeckel, T. Tuschl, Science 294, 853 (2001).
4. N. C. Lau, L. P. Lim, E. G. Weinstein, D. P. Bartel, Science 294, 858 (2001).
5. R. C. Lee and V. Ambros, Science 294, 862 (2001).
6. Z. Mourelatos, et al., Genes Dev. 16, 720 (2002).
7. G. Ruvkun, Science 294, 797 (2001).
8. H. Grosshans and F. J. Slack, J. Cell Biol. 156, 17 (2002).
9. K. M. Wassarman, A. Zhang, G. Storz, Trends Microbiol. 7, 37 (1999).
10. S. Altuvia and E. G. H. Wagner, Proc. Natl. Acad. Sci. U.S.A. 97, 9824 (2000).
11. F. Sleutels, R. Zwart, D. P. Barlow, Nature 415, 810 (2002).
12. V. A. Erdmann, M. Szymanski, A. Hochberg, N. de Groot, J. Barciszewski, Nucleic Acids Res. 28, 197 (2000).
13. P. Avner and E. Heard, Nature Rev. Genet. 2, 59 (2001).
14. K. M. Wassarman and G. Storz, Cell 101, 613 (2000).
15. Z. Yang, Q. Zhu, K. Luo, Q. Zhou, Nature 414, 317 (2001).
16. V. T. Nguyen, T. Kiss, A. A. Michels, O. Bensaude, Nature 414, 322 (2001).
17. R. B. Lanz, et al., Cell 97, 17 (1999).
18. J.-L. Chen, M. A. Blasco, C. W. Greider, Cell 100, 503 (2000).
19. D. N. Frank and N. R. Pace, Annu. Rev. Biochem. 67, 153 (1998).
20. C. L. Will and R. Lührmann, Curr. Opin. Cell Biol. 13, 290 (2001).
21. T. Kiss, EMBO J. 20, 3617 (2001).
22. T.-H. Tang et al., Proc. Natl. Acad. Sci. U.S.A., in press.
23. M. L. Kable, S. Heidmann, K. D. Stuart, Trends Biochem. Sci. 22, 162 (1997).
24. L. Simpson, O. H. Thiemann, N. J. Savill, J. D. Alfonzo, D. A. Maslov, Proc. Natl. Acad. Sci. U.S.A. 97, 6986 (2000).
25. K. Nishikura, Cell 107, 415 (2001).
26. P. D. Zamore, Science 296, 1265 (2002).
27. E. Massé and S. Gottesman, Proc. Natl. Acad. Sci. U.S.A. 99, 4620 (2002).
28. R. Gillet and B. Felden, Mol. Microbiol. 42, 879 (2001).
29. R. J. Keenan, D. M. Freymann, R. M. Stroud, P. Walter, Annu. Rev. Biochem. 70, 755 (2001).
30. V. Y. Kuryshev, B. V. Skryabin, J. Kremerskothen, J. Jurka, J. Brosius, J. Mol. Biol. 309, 1049 (2001).
31. W. Wang, F. G. Brunet, E. Nevo, M. Long, Proc. Natl. Acad. Sci. U.S.A. 99, 4448 (2002).
32. T. M. Lowe and S. R. Eddy, Science 283, 1168 (1999).
33. C. Gaspin, J. Cavaillé, G. Erauso, J.-P. Bachellerie, J. Mol. Biol. 297, 895 (2000).
34. W. M. Olivas, D. Muhlrad, R. Parker, Nucleic Acids Res. 25, 4619 (1997).
35. K. M. Wassarman, F. Repoila, C. Rosenow, G. Storz, S. Gottesman, Genes Dev. 15, 1637 (2001).
36. L. Argaman, et al., Curr. Biol. 11, 941 (2001).
37. S. Chen et al., Biosystems, in press.
38. R. J. Klein, Z. Misulovin, S. Eddy, Proc. Natl. Acad. Sci. U.S.A., in press.
39. E. Rivas, R. J. Klein, T. A. Jones, S. R. Eddy, Curr. Biol. 11, 1369 (2001).
40. R. J. Carter, I. Dubchak, S. R. Holbrook, Nucleic Acids Res. 29, 3928 (2001).
41. J. Cavaillé, et al., Proc. Natl. Acad. Sci. U.S.A. 97, 14311 (2000).
42. A. Hüttenhofer, et al., EMBO J. 20, 2943 (2001).
43. D. W. Selinger, et al., Nature Biotechnol. 18, 1262 (2000).
44. K. E. Plant, S. J. E. Routledge, N. J. Proudfoot, Mol. Cell. Biol. 21, 6507 (2001).
45. X. Chen, A. M. Quinn, S. L. Wolin, Genes Dev. 14, 777 (2000).
46. J. Gu, Y. Chen, R. Reddy, Nucleic Acids Res. 26, 160 (1998).
47. D. A. Samarsky and M. J. Fournier, Nucleic Acids Res. 27, 161 (1999).
48. A. E. Souza, T. Hermann, H. U. Göringer, Nucleic Acids Res. 25, 104 (1997).
49. R. T. Batey, R. P. Rambo, L. Lucast, B. Rha, J. A. Doudna, Science 287, 1232 (2000).
50. I thank S. Altuvia, J. Brosius, S. Gottesman, K. M. Wassarman, and A. Zhang for helpful discussions and comments on the manuscript. I also thank them and many other investigators for extensive discussion of the nomenclature.
1. Frenster JH, "Nuclear RNA Species Activate DNA Transcription Within Chromatin", FASEB Journal, Vol. 13, No. 7, A1506 (April 23, 1999).
2. J. A. Hovsepian and J. H. Frenster, "RNA-Induced DNA Melting during Selective Gene Transcription".
3. J. H. Frenster, "Yeast RNA Re-Programming of Already-Active Mammalian Chromatin".
4. J. H. Frenster, "Activation of DNA Transcription within Repressed Chromatin".
5. J. H. Frenster, "Nuclear Polyanions as De-Repressors of Synthesis of Ribonucleic Acid".
6. B. Y. Tseng and M. Goulian, "Initiator RNA of Discontinuous DNA Synthesis in Human Lymphocytes".
7. S. DeCarvalho, "Effect of RNA from Normal Human Marrow on Leukaemic Marrow In-Vivo".
8. J. Dobrzelewski, Z. Milewska and H. Panusz, "Effect
on Transcription of Low-Molecular-Weight RNA from Calf Thymus Chromatin".
Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events -