Piero Carninci 1, 2, Kazunori Waki 1, Toshiyuki Shiraki 1, Hideaki Konno 1, Kazuhiro Shibata 2, Masayoshi Itoh 2, Katsunori Aizawa 1, Takahiro Arakawa 1, Yoshiyuki Ishii 1, Daisuke Sasaki 1, Hidemasa Bono 1, Shinji Kondo1 1, Yuichi Sugahara 1, Rintaro Saito 1, Naoki Osato 1, Shiro Fukuda 1, Kenjiro Sato 2, 3, Akira Watahiki 2, 3, Tomoko Hirozane-Kishikawa 1, Mari Nakamura 1, Yuko Shibata 2, 6, Ayako Yasunishi 1, Noriko Kikuchi 2, Atsushi Yoshiki 5, Moriaki Kusakabe 5, 7, Stefano Gustincich 8, Kirk Beisel 9, William Pavan 10, Vassilis Aidinis 11, Akira Nakagawara 12, William A. Held 13, Hiroo Iwata 14, Tomohiro Kono 15, Hiromitsu Nakauchi 16, Paul Lyons 17, Christine Wells 18, David A. Hume 18, Michela Fagiolini 19, Takao K. Hensch 19, Michelle Brinkmeier 20, Sally Camper 20, Junji Hirota 21, Peter Mombaerts 21, Masami Muramatsu 1, 2, 3, Yasushi Okazaki 1, 2, Jun Kawai 1, 2 and Yoshihide Hayashizaki 1, 2, 3, 4, 22
1 Laboratory for Genome Exploration Research Group, RIKEN
Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku,
Yokohama, Kanagawa 230-0045, Japan;
2 Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama
351-0198, Japan;
3 Institute of Basic Medical Sciences, University of
Tsukuba, Tsukuba, Ibaraki 305-8577, Japan;
4 Japan Division of Genomic Information Resources, Science
of Biological Supramolecular Systems, Graduate School of Integrated Science,
Yokohama City University, Tsurumi-Ku, Yokohama 230-0045, Japan; 5
Experimental
Animal Research Division, Biogenic Resources Center, RIKEN Tsukuba Institute,
Tsukuba,
Ibaraki 305-0074, Japan;
6 Dnaform International, Inc., Ami Town, Inashiki District,
Ibaraki 300-0332, Japan;
7 Aloka Co., LTD, Kasumigaura-cho, Niihari-gun, Ibaraki
300-0134 Japan;
8 Department of Neurobiology, Harvard Medical School,
Boston, Massachusetts 02115, USA;
9 Boys Town National Research Hospital, Omaha, Nebraska
68131, USA;
10 National Human Genome Research Institute, National
Institutes of Health, Bethesda, Maryland 20892, USA; 11 Institute
of Immunology, Biomedical Sciences Research Center A1. Fleming, 16672 Vari,
Greece;
12 Chiba Cancer Center Research Institute, Division of
Biochemistry, Chuo-ku, Chiba 260-8717, Japan; 13 Roswell Park
Cancer Institute, Buffalo, New York 14263, USA;
14 Department of Reparative Materials Field of Tissue
Engineering, Institute for Frontier Medical Sciences, Kyoto University,
Sakyo-ku, Kyoto 606-8507, Japan;
15 Faculty of Applied Bioscience, Department of BioScience,
Tokyo University of Agriculture, Setagaya-ku, Tokyo 156-8502, Japan;
16 Laboratory of Stem Cell Therapy Center for Experimental
Medicine, Institute of Medical Science,
University of Tokyo Minato-ku, Tokyo 108-8639, Japan;
17 DRF/WT Diabetes and Inflammation Laboratory Cambridge
Institute for Medical Research, Cambridge CB2 2XY, UK;
18 The Institute for Molecular Biosciences, The University
of QLD, St. Lucia Brisbane, QLD 4072, Australia;
19 Neuronal Function Research, Lab for Neuronal Circuit
Development, RIKEN Brain Science Institute (BSI), Wako-shi, Saitama 300-0198,
Japan;
20 University of Michigan Medical, Ann Arbor, Michigan
48109, USA;
21 Developmental Biology and Neurogenetics, The Rockefeller
University, New York, New York 10021, USA
22 Corresponding author:
E-MAIL: rgscerg@gsc.riken.go.jp
FAX: +1 8145 503 9216
We report the construction of the mouse full-length cDNA encyclopedia,
the most extensive view of a complex
transcriptome, on the basis of preparing and sequencing 246 libraries.
Before cloning, cDNAs were enriched in full-length by Cap-Trapper, and
in most cases, aggressively subtracted/normalized. We have produced 1,442,236
successful 3'-end sequences clustered into 171,144 groups, from which 60,770
clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation.
We have also produced 547,149 5' end reads, which clustered into 124,258
groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional
units (TU), which represent the best coverage of a transcriptome so far.
By monitoring the extent of normalization/subtraction, we define the tentative
equivalent coverage (TEC), which was estimated to be equivalent to >12,000,000
ESTs derived from standard libraries. High coverage explains discrepancies
between the very large numbers of clusters (and TUs) of this project, which
also include non-protein-coding RNAs, and the lower gene number
estimation of genome annotations. Altogether, 5'-end clusters identify
regions that are potential promoters for 8637 known genes and 5'-end clusters
suggest the presence of almost 63,000 transcriptional starting points.
An estimate of the frequency of polyadenylation signals suggests that at
least half of the singletons in the EST set
represent real mRNAs. Clones accounting for about half of the predicted
TUs await further sequencing. The continued high-discovery rate suggests
that the task of transcriptome discovery is not yet complete.
[Supplemental material available online at: http://www.genome.org
]
0. Special Issue of Genome Research, vol. 13,
no. 6b, pp. 1265-1561 (June 2, 2003).
Report of "RIKEN Mouse Genome Encyclopedia" project: the whole system
from mouse house to database.
1. Numata K , Kanai A, Saito R, Kondo S, Adachi J, Wilming LG, Hume DA, RIKEN GER Group, GSL Members, Hayashizaki Y, and Tomita M, "Identification of Putative Noncoding RNAs Among the RIKEN Mouse Full-Length cDNA Collection", Genome Research, vol. 13, no. 6b, pp. 1301-1306 (June 2, 2003).
2. Bono H, Yagi K, Kasukawa T, Nikaido I, Tominaga N, Miki R, Mizuno Y, Tomaru Y, Goto H, Nitanda H, Shimizu D, Makino H, Morita T, Fujiyama J, Sakai T, Shimoji T, Hume DA, RIKEN GER Group, Arakawa T, Carninci P, Kawai J, Hayashizaki Y, and Okazaki Y, "Systematic Expression Profiling of the Mouse Transcriptome Using RIKEN cDNA Microarrays", Genome Research, vol. 13, no. 6b, pp. 1318-1323 (June2, 2003).
Additional References:
1. Saha S, Ansari AZ, Jarell KA, and Ptashne M, "RNA Sequences that Work as Transcriptional Activating Regions".
2. Lee JM, and Sonnhammer ELL, "Genomic Gene Clustering Analysis of Pathways in Eukaryotes".
3. Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M, and Kim SK, "A Global Analysis of Caenorhabditis elegans Operons".
4. Storz G, "An Expanding Universe of Noncoding RNAs".
5. Eddy SR, "Non-Coding RNA Genes and the Modern RNA World".
6. Huttenhofer A, Kiefmann M, Meier-Ewert S, O'Brien J, Lehrach H, Bachellerie J-P, and Brosius J, "RNomics: An Experimental Approach that Identifies 201 Candidates for Novel, Small, Non-Messenger RNAs in Mouse".
7. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription".
8. Frenster JH, "Ultrastructural Probes of Active
DNA Sites, and the RNA Activators of DNA".