Hidemasa Bono 1, Ken Yagi 1, Takeya Kasukawa 1, 2, Itoshi Nikaido 1, 3, Naoko Tominaga 1, Rika Miki 1, Yosuke Mizuno 1, Yasuhiro Tomaru 1, Hitoshi Goto 1, Hiroyuki Nitanda 1, Daisuke Shimizu 1, Hirochika Makino 1, Tomoyuki Morita 1, Junshin Fujiyama 1, Takehito Sakai 1, Takashi Shimoji 1, David A. Hume 4, RIKEN GER Group 1, GSL Members 5, 6, Yoshihide Hayashizaki 1, 3, and Yasushi Okazaki 1, 7
1 Laboratory for Genome Exploration Research Group, RIKEN
Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku,
Yokohama, Kanagawa 230-0045, Japan;
2 Multimedia Development Center, Advanced Technology
Development Department, NTT Software Corporation, Naka-ku, Yokohama, Kanagawa
231-8554, Japan;
3 Division of Genomic Information Resource Exploration,
Science of Biological Supramolecular Systems, Yokohama City University,
Graduate School of Integrated Science, Tsurumi-ku, Yokohama, Kanagawa
230-0045, Japan;
4 Institute for Molecule Bioscience and ARC Special Research
Centre for Functional and Applied Genomics, University of Queensland, Q4072,
Australia;
5 Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama
351-0198, Japan
6 Takahiro Arakawa, Piero Carninci, and Jun Kawai.
7 Corresponding author:
E-MAIL: rgscerg@gsc.riken.go.jp
FAX: +1- 81-45-503-9216
The number of known mRNA transcripts in the mouse has been greatly
expanded by the RIKEN Mouse Gene
Encyclopedia project. Validation of their reproducible expression
in a tissue is an important contribution to the study of functional genomics.
In this report, we determine the expression profile of 57,931 clones on
20 mouse tissues using cDNA microarrays. Of these 57,931 clones, 22,928
clones correspond to the FANTOM2 clone set. The set represents 20,234 transcriptional
units (TUs) out of 33,409 TUs in the FANTOM2 set. We identified 7206 separate
clones that satisfied stringent criteria for tissue-specific expression.
Gene Ontology terms were assigned for these 7206 clones, and the proportion
of `molecular function' ontology for each tissue-specific clone was examined.
These data will provide insights into the function of each tissue. Tissue-specific
gene expression profiles obtained using our cDNA microarrays were also
compared with the data extracted from the GNF Expression Atlas based on
Affymetrix microarrays. One major outcome of the RIKEN transcriptome analysis
is the identification of numerous nonprotein-coding mRNAs. The expression
profile was also used to obtain evidence of expression for putative noncoding
RNAs. In addition, 1926 clones (70%) of 2768 clones that were categorized
as "unknown EST," and 1969 (58%) clones of 3388 clones that were categorized
as "unclassifiable" were also shown to be reproducibly expressed.
DNA microarray technology revolutionized gene expression analysis(DeRisi
et al. 1997
).
DNA microarrays containing virtually all yeast open reading
frames (ORFs) have been applied to explore gene expression profiles
for various physiological conditions (Eisen et al. 1998
).
In a recent report (Spellman and Rubin 2002
),
a striking set of experiments using cDNA microarray profiling
in Drosophila revealed that co-expressed genes are clustered
in the genome, suggesting long-range coordination of transcriptional
control. Although there have been many notable successes in
the application of cDNA microarrays to mammalian gene regulation
(Alizadeh et al. 2000
),
the sets of transcripts analyzed have been far from comprehensive,
because the mammalian transcriptome has been incomplete. The
RIKEN Mouse Encyclopedia project aims to make a library of all
transcribed sequences as cDNA clones (The RIKEN Genome Exploration
Research Group Phase II Team and the FANTOM Consortium 2001
).
Analysis of the expression pattern for these cDNAs is a major resource
for functional annotation. In particular, many of the transcriptswithin
the RIKEN cDNA clone set do not code for protein, orcode for hypothetical
proteins. Evidence of expression, particularlytissue-specific expression,
can provide an indication that the transcript is likely to be
functionally significant. Conversely, lack of any evidence of
expression in any tissue might indicate that a transcript is
an artifact, or unprocessed nuclear RNA. Expression in a particular
tissue may also give insights into likely function for annotated
proteins in which the only information available is the presence
of a conserved domain or motif.
Following the acquisition of RIKEN mouse full-length cDNAs,we produced
our first microarray set, called the RIKEN 19K mouse microarray,
which contained a subset of the FANTOM1 full-length cDNAs as
well as a large selection of cDNAs from known genes. These arrays
were used in producing expression profiling of 49 distinct mouse
tissues, and the results were released in the RIKEN Expression
Array Database (READ; Miki et al. 2001
;Bono
et al. 2002
).
After that effort, we continued characterizing gene expression
profiles for mouse tissues using newly sequenced mouse cDNAs
as they were acquired. The second and third set of mouse cDNA
microarrays, in each of which 19,584 unique cDNA clones were
spotted, were prepared and then used for gene expression profiling
for 20 tissues. The number of tissues analyzed was reduced by
focusing mainly on the adult tissues. The set of cDNAs on these
arrays, combined with the earlier 19K set, comprises approximately
60% of the representative transcript set produced in the FANTOM2
annotation process (The FANTOM Consortium and the RIKEN Genome
Exploration Research Group Phase I and II Team 2002
).
Here we present some highlights of this extended analysis.
High Coverage of RIKEN Mouse cDNA Microarray Set in Mouse Transcriptome
The first 19K set (called RIKEN 19k set; 18,763 unique cDNAclones on the array) and newly developed second and third setsof RIKEN mouse cDNA microarrays (called RIKEN 20k chip-2 andchip-3, respectively; containing 19,584 unique cDNA clones each) contain a total of 57,931 unique cDNA clones (denoted as the RIKEN 60K microarray set) and are spotted on three glass slides. We observed that 22,928 clones (~40 %) overlapped with the 60,770 FANTOM2 cDNA clone set (Table 1). cDNA clones usedfor cDNA microarray were not identical to those chosen for full-length sequencing, because novel sequences not in the public database at that time were preferably taken for full-length sequencing, whereas known genes identified from phase1, 3' end sequencing were preferably chosen for cDNA microarrays, to ensure that all transcripts of known function were on the arrays.
Table 1.
Number
of Clones or Clusters that are Included in RIKEN Mouse cDNA Microarray
and FANTOM2 Clone Set
|
|
|||
|
|
Number of cDNA clones in FANTOM2 seta
|
Clusters in FANTOM2 setb
|
Clusters in RTSc
|
| 60k | |||
| 19k | 6,333 | ||
| 20k-2 | 7,397 | 20,234 | 22,217 |
| 20k-3 | 9,198 | ||
| Not on chip | 37,842 | 13,175 | 15,869 |
| Total
|
60,770
|
33,409
|
37,086
|
a Number of clones of FANTOM2 set that overlap with the RIKEN cDNA microarray
b Number of clusters from the FANTOM2 set that overlap with the RIKEN cDNA microarry
c Number of clusters from the RTS that overlap with the RIKEN cDNA microarray set
To further assign correct correspondence between the microarrayclone
set and the FANTOM2 clone set, we performed a systematicanalysis of cDNA
sequences on the arrays against the representativetranscript set (RTS)
used to assess the FANTOM2 sequence setand thought to reflect the mouse
transcriptome. The comparisonwas carried out using NCBI BLASTN with a high-stringency
cutoff(E<1e-100; Marra et al. 1999
).
We found that 20,234 transcriptional units (TUs) of the 33,409
TUs in the FANTOM2 set were contained in the RIKEN 60K microarray
set, and 22,217 clusters of the 37,086 clusters were in the
RTS (Table 1). Although it seems there are
redundancies in the clone set from the clustering results based
on the TUs, it should be noted that because these are not fully
sequenced, a subset will certainly be redundant with the RTS,
and will probably represent alternative 3' UTRs which are common
in the mammalian transcriptome (The FANTOM Consortium and the
RIKEN Genome Exploration Research Group Phase I and II Team
2002
).
By analogy, despite the fact that the sequencing of the 60,770
FANTOM2 clones was prioritized based on novel 3' and 5' ends,
the set collapsed by almost 50% (i.e., there is twofold redundancy)
upon clustering of the full-length sequences.
Microarray Analysis for Clones
In addition to the previously reported microarray data for 49mouse tissues using the RIKEN 19K mouse cDNA microarray (thefirst 19K set), new microarray data were produced for profilingtissues in mouse. Gene expression profiles for adipose tissuewere newly added to the set produced with the original 19K set. The 20 tissues selected for analysis using chip 2 and chip 3 were selected mainly from the major adult organs (spleen, thymus, kidney, heart, lung, liver, brain, cerebellum, 10-day-neonate cerebellum, placenta, testis, uterus, pancreas, small intestine, stomach, colon, bone, adipose, muscle, and 10-day-neonate skin). In total, 57,931 gene expression profiles for 20 tissues were included for the analyses.
The log-transformed ratio using the RNA extracted from Day 17.5embryo
whole-body as control was stored in READ (RIKEN ExpressionArray Database,
http://READ.gsc.riken.go.jp/fantom2/;
Bono et al. 2002
).
Where the target on the array is contained within the FANTOM2
set, the expression profiles described here are integrated with
the functional annotations of cDNA clones (The FANTOM Consortium
and the RIKEN Genome Exploration Research Group Phase I and
II Team 2002
).
Prominent features for this large gene expression profile are
described below.
Tissue Profiling by Gene Ontology
We explored the functional category of Gene Ontology (GO) termsassigned to cDNA clones whose gene expression pattern was restrictedto a subset of tissues on the microarrays. The genes that areexpressed in a tissue-specific manner were extracted by thecriteria described in the Methods section. As we are focusedon the function of genes, we used GO Slim terms (http://www.ebi.ac.uk/proteome/goslimterms.html)for the molecular function ontology in the Gene Ontology project.GO Slim was constructed by selecting a set of high-level GOterms to cover most aspects of the functional classification.
At a glance, NA (Not Assigned) terms are prevalent even in tissue-specificgenes (Fig. 1), indicating the current limitations of our knowledge of the functions of mammalian genes. Relatively well characterized tissues, such as heart, liver, stomach, and kidney showed the highest percentage of GO assigned genes, perhapsreflecting a relatively low level of transcriptional complexityand highly defined function (Fig. 1). Placenta has a high proportion of genes assigned a signal transduction function, in large measure because of the inclusion of the numerous small secreted growth factors (placental lactogen2, placental growth factor, prolactin-like protein A, B, C, F, G, etc.) in this class.
Figure 1
Pie
charts for tissue profiling by Gene Ontology
Comparison With the Data From Affymetrix GeneChip
The tissue expression gene ontology diagram was also constructedfor
the data in GNF Gene Expression Atlas (http://expression.gnf.org/;Su
et al. 2002
),
which uses the Affymetrix Chip (Suppl. Fig. 1; http://READ.gsc.riken.go.jp/fantom2/supplement/tissue_profiling/GNF/).There
has been no previous comparison of the two technologies(full-length cDNAs
vs. printed oligonucleotide arrays) and the data provide important
cross-validation. There were 15 tissues that were common between
the two sets of array experiments. For these 15 tissues, the
gene ontology molecular function diagram was also constructed
and compared with that of RIKEN cDNA microarrays (Suppl. Fig.
2; http://READ.gsc.riken.go.jp/fantom2/supplement/tissue_profiling/compara/).As
shown, the pattern of each corresponding tissue of the GOdiagram is very
similar.
Gene Expression of cDNA Clones Categorized as `unknown EST' or `Unclassifiable'
For cDNA clones that were assigned no functional descriptionsfrom sequence similarity searches, cDNA microarray analysiscan at least provide an indication as to tissue-specific expressionthat might infer possible function. cDNA clones in two categories,`unknown EST hit' and `Unclassifiable' were examined in detailto determine the gene expression profiles in the 20 tissuesexamined. cDNA clones in the category `unknown EST hit' arethose without any sequence hits to existing proteins, but whichhave sequence similarity to archived ESTs in the public database.Conversely, clones in the category `Unclassifiable' are thosewithout any sequence hits to existing proteins or ESTs. We found that 1926 clones (70%) of the 2768 clones that were categorized as `unknown EST', and 1969 (58%) clones out of 3388 clones that were categorized as `unclassifiable' were confirmed to be expressed in the microarray according to stringent cut-off criteria (Table 2). The genes that were evaluated as expressed are listed in Supplemental Table 1. Hierarchical clustering of gene expression data for cDNA clones in the `Unclassifiable' category reveals that several genes in this category show tissue-specific gene expression in specific tissues, even in log-transformed ratio data (Suppl. Fig. 3; http://READ.gsc.riken.go.jp/fantom2/supplement/3/).It should be noted that absence of detectable expression doesnot necessarily infer that the transcript is not expressed or is nonfunctional. Many noncoding RNAs are expressed at very low levels, and may fall below the detection limits of microarrays in either the target tissue or the 17-day-embryo reference control.
Table 2.
Number
of Spots on cDNA Microarrays Judged to be Expressed or Not Expressed
|
|
||
|
|
unknown EST hit
|
Unclassifiable
|
| 20k-1 | ||
| Total spots of this category | 516 | 360 |
| Expressed | 454 | 276 |
| Not expressed | 0 | 2 |
| Marginally expressed | 62 | 82 |
| 20k-2 | ||
| Total spots of this category | 795 | 1,152 |
| Expressed | 608 | 727 |
| Not expressed | 42 | 74 |
| Marginally expressed | 145 | 351 |
| 20k-3 | ||
| Total spots of this category | 1,457 | 1,876 |
| Expressed | 864 | 966 |
| Not expressed | 81 | 134 |
| Marginally expressed
|
512
|
776
|
Other Applications of Microarray Ratio Data
The major purpose of this short paper is to announce the availabilityof
these data, and the corresponding expanded Web interface.There are numerous
applications, some of which are describedin other reports in this special
issue of Genome Research. For example, the evidence of tissue-specific
expression was used for the analyses of small secreted proteins
in the global analysis of the secretome (Grimmond, et al. 2003
).
`Search multiple clones' in the READ Web interface (http://read.gsc.riken.go.jp/fantom2/)allows
researchers to easily retrieve a set of gene expressionpatterns for cDNA
clones of interest. For example, gene expressionprofiles for genes in a
specific metabolic pathway are availableonly by `copy and paste' operation
from the table in MetabolomapperWeb site (http://fantom2.gsc.riken.go.jp/metabolome/;
Bono et al. 2003
).
The search interface is designed to permit visualization of
the tissue expression profiling of a subset of genes.
In conclusion, the RIKEN Expression Array Database now representsa major resource for functional genomics in the mouse. We havereported the expression profiling of 57,931 clones for 20 tissues.Comparative analysis with other types of resources emergingin the public domain, such as the GNF Expression Array resource,will provide extensive validation to enable robust analysesof transcriptional networks in the mouse.
RNA Extraction
The 20 adult mouse tissues for exploring genes with tissue-specificexpression
patterns were as follows: spleen, thymus, kidney,heart, lung, liver, brain,
cerebellum, 10-day-neonate cerebellum,placenta, testis, uterus, pancreas,
small intestine, stomach,colon, 10-day-neonate skin, bone, muscle, and
adipose. RNA extraction was performed by the AGPC method (Miki
et al. 2001
;Ichikawa
et al. 2002
;
Mizuno et al. 2002
).
Preparation of Target DNAs
The target DNAs were collected from RIKEN mouse cDNA libraries,which were constructed using the CAP trapper method to enrichfor full-length inserts. The cDNAs were amplified using M13forward and reverse primers in a 100-µL PCR reaction with 0.2µM final concentration (each) of forward (F1224; 5'-cgccagggttttcccagtcacga-3') and reverse (R1233; 5'-agcggataacaatttcacacagga-3') primers, 250µM dNTPs, and 1.25U Ex Taq in 1 x Ex Taq buffer (TAKARA). The PCR product was precipitated by using isopropanol and resuspended in 15µL 3x SSC. The DNA solution was spotted on poly-L-lysine-coated slides by using a DNA arrayer (http://cmgm.stanford.edu/pbrown/mguide/index.html) with 16 tips (SMP3, TeleChem International). The diameter of the spots was 100–150 µm. Mouse b-actin and G3PDH cDNAs were used as positive controls, and Arabidopsis cDNAs were used as negative controls (Accession nos. X98108, X13611, X90769, Z99707, AF004393, Z49777, Q03943, U58284).
Preparation of Probes
One µg of mRNA extracted from each of the 20 tissues waslabeled by incorporating Cy3 during random-primed reverse transcription.cDNA derived from entire E17.5 embryos, which we labeled withCy5, was used as the expression reference for all tissues. The labeling was carried out at 42°C for 1 h in a total volume of 30µL containing 400 U SuperScriptII (Gibco BRL), 0.1 mM Cy3-dUTP (or Cy5-dUTP), 0.5 mM each dATP, dCTP, and dGTP; 0.2 mM dTTP, 10 mM DTT, 6µL 5x first-strandbuffer, and 6µg random primers. To remove unincorporatednucleotide, labeled cDNA was mixed with 500µL bindingbuffer (5M guanidine-SCN,10 mM Tris pH.7.0, 0.1 mM EDTA, 0.03%gelatin, and 2 ng/µL tRNA) and 50µL silica matrixbuffer (10% matrix, 3.5 M Guanidine chloride, 20% glycerol,0.1 mM EDTA, and 200 mM NaOAc pH4.8–5.0), transferred to a GFX column (Amersham Pharmacia), and centrifuged at 15,000 rpm for 30 sec. The flow-through was discarded, and the column was washed with 500µL wash buffer. The adsorbed probe was eluted into a final volume of 17µL distilled water. This labeled probe was mixed with blocking solution containing 3 µL of 10µg/µL oligo-dA, 3 µL of 20 µg/µL yeast tRNA, 1 µL of 20µg/µL mouse Cot1 DNA, 5.1µL 20 x SSC, and 0.9µL 10% SDS.
Array Hybridization and Data Analysis
The RIKEN full-length mouse cDNA that comprised the target washybridized in a final volume of 30µL; the entire arrayconsists of three multi-blocks, and each multi-block required10µL hybridization solution. Prior to hybridization, probe aliquots were heated at 95°C for 1 min and cooled at room temperature. Cover slips were hybridized overnight at 65°C in a hybricasette (obtained from ArrayIt.com).After hybridization, slides were washed in 2X SSC, 0.1% SDSuntil the cover slips dropped off, the slides were then transferredinto 1 x SSC, shaken gently for 2 min, and rinsed with 0.1 x SSC for 2 min. After washing, slides were spun at 800 rpm using a SORVALL (RC-3B plus; rotor, H6000A/HBB6) centrifuge. These slides were scanned on a ScanArray 5000 confocal laser scanner, and the images were analyzed by using ImaGene (BioDiscovery).
Analysis of the Data
To improve the accuracy of the data, we did the experiment twice,labeling
the same RNA template in two separate reactions. Datawere normalized to
the reference standard by subtracting (inlog space) the median observed
value if it were other than zero. We only used data points that
were reproducible. To this end, we developed a filtering program,
PRIM (Preprocessing Implementationfor Microarray; Kadota et al. 2001
).
Briefly, this program (1) deletes the results with "flags" added
manually to corrupted spots, (2) eliminates spots with signal
intensities less than the mean + 3 x
standard deviation (S.D.) of the background signal intensity
in either Cy-3 or Cy-5, and (3) eliminates spots located outside
the least-mean squares line ± 2 x
S.D. After the filtering was finished, we compared the results of
the two experiments by calculating a Pearson's correlation coefficient.
If the coefficient were equal to or greater than 0.7, we used
the data in subsequent analyses. If not, we repeated the labeling,
hybridization, and scanning up to six times. In this way, we
could generate high-quality data for most tissues. Before the
clustering, ratio values from duplicate experiments were averaged,
log-transformed (base 2), and stored in a table. We applied
hierarchical clustering to both axes using the weighted pair-group
method with a centroid average as implemented by the program
Cluster (http://www.microarrays.org/software;
Eisen et al. 1998
).
The distance matrices we used were the Pearson correlation for
clustering the arrays and the inner product of vectors normalized
to magnitude 1 for the genes (this is a slight variation of
the Pearson correlation). The results were analyzed using TreeView
(http://www.microarrays.org/software;Eisen
et al. 1998
).
Data Processing
Arrays were scanned using a ScanArray 5000 confocal scanninglaser
microscope (PerkinElmer Life Sciences), and then TIFFimage data were extracted
using DigitalGENOME software (MolecularWare),and finally reproducible spots
were identified using the PRIMfiltration program (Kadota et al. 2001
).
Extracting Tissue-Specific Expressed Genes
Log-transformed ratio data, processed and normalized by PRIM,were
used to find genes expressed in a tissue-specific manner.The log-transformed
ratio values for one cDNA clone were normalized,and the clone was denoted
as `tissue-specific' if the normalizedratio value exceeded mean + 3 S.D.
for our cDNA microarray and mean + 2 S.D. for Affy chips.
Finally, the GO terms for these clones were extracted, and 14representative terms in molecular_function ontology (http://www.geneontology.org/ontology/function.ontology)were assigned to all cDNA clones. If there was no GO annotationin molecular_function, code `NA' was assigned.
Gene Expression for cDNA Clones in the Functional Category `unknown
EST' or `Unclassifiable'
To check whether the gene is expressed, the intensity of thecorresponding
spot was evaluated. The background intensity was used to test
this by checking whether (1) the intensity of the spot was more
than 10 S.D. of all normalized background intensity values,
and (2) this condition was met in the duplicated experiments.
If these criteria sufficed for any experimental conditions,
the corresponding gene was regarded as `expressed'. cDNA clones
whose FANTOM2 functional category was either `unknown EST' or
`Unclassifiable' were extracted, and their gene expressions were
examined using the method mentioned above.
We thank M.C. Nakao for technical assistance with the figure;H. Matsuda, H. Kawaji, F. Collins, and S. Batalov for valuablediscussion and comments; and Y. Tsujimura, C. Saito, S. Watanabe,T. Kobayashi, G. Matsuda, E. Nakayama, A. Wakamoto, S. Suyama,M. Yahata, H. Arai, T. Shinauchi, S. Arai, K. Kadota, and M.Kadomura for technical assistance and helpful discussions. This study was supported by a Research Grant for the RIKEN Genome Exploration Research Project from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government (MEXT) to Y.H., and Grant-in-Aid for Scientific Research on Priority Areas (C) "Genome Information Science" from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government (MEXT) to H.B.
Experimental Design
1. type of experiment ; expression profile of each tissues
2. experimental factors ; normal
3. the number of hybridizations performed experiment ; duplicate
4. the type of reference used for the hybridizations ; RNA
from whole body of embryo 17.5 days
5. hybridization design ; cy3 : each tissue, cy5 : embryo
17.5 days
6. quality control steps taken ; duplicate, positive controls
and negative controls
7. URL of any supplemental websites or database accession number
; http://READ.gsc.riken.go.jp/fantom2/
Sample used, extract preparation and labeling
1. the origin of the biological sample ; mouse ( C57BL/6J
), male, at 8week( adult ). the female specific reproductive organs
were prepared from female mice at 8 week (adult).
2. manipulation of biological samples and protocols used
; Mice used in this study were bred under SPF condition. This experiment
was approved by IACUC of RIKEN.
3. protocol for preparing the hybridization extract ; Total
RNAs were extracted using the acid guanidine phenol chlorophorm (AGPC)
method (Carninci, P. and Y. Hayashizaki. 1999. High-efficiency full-length
cDNA cloning. Methods Enzymol 303: 19-44)
4. labeling protocol ; aminoallyl method
5. external controls ; positive controls (G3PDH, beta actin,
elongation factor 2), negative controls (clones of Arabidopsis thaliana).
Hybridization procedures and parameters
1. the protocol and conditions used during hybridization, blocking and washing ; hybridization : 65o, over night, blocking : no blocking, washing : 2*SSC, 0.1*SDS -> 1*SSC -> 0.2*SSC
Measurement data and specifications
1. the quantifications based on the image ; DigitalGENOME
(MolecularWare, Inc., Cambridge, MA, USA)
2. type of scanning hardware and software used ; ScanArray
5000 (GSI Lumonics Inc., Billerica, MA, USA)
3. type of image analysis software used ; DigitalGENOME (MolecularWare,
Inc., Cambridge, MA, USA)
4. a description of the measurements produced by image-analysis
software and a description of which measurements were used in the analysis
; Mean value of the pixels in the circled area (provided by the Manufacturer:
MolecularWare, Inc., Cambridge, MA, USA)
5. the complete output of the image analysis before data selection
and transformation ; available on request
6. data selection and transformation procedures ; Valid data
were selected by the PRIM method (Kadota, K., R. Miki, H. Bono, K. Shimizu,
Y. Okazaki, and Y. Hayashizaki. 2001. Preprocessing implementation for
microarray (PRIM): an efficient method for processing cDNA microarray data.
Physiol Genomics 4: 183-188.) The value of the ratio (Cy3/Cy5) was
log base2 transformed and used.
7. final gene expression data tables used by the authors to make
their conclusions after data selection and transformation ; available
on request
Array design
1. general array design ; spotted glass array, gamma-amino-propyl-silane
coated slides (CMT-GAPS coated slides, Corning, Inc., Corning, NY, USA)
2. For each feature on the array and the ID of its respective
reporter (molecular present on each spot) should be given. ; available
at: http://READ.gsc.riken.go.jp/fantom2/
3. For each reporter, its type should be given ; RIKEN full-length
enriched cDNA clones
4. along with information that characterizes the reporter molecule
unambiguously, in the form of appropriate database references and sequence;
All the sequences are available from the public database. The correlation
of accession number is available at: http://READ.gsc.riken.go.jp/fantom2/
5. For non-commercial arrays, the following details should be
provided:
a. the source of the reporter molecules ; RIKEN full-length
enriched clones
b. the method of reporter preparation ; CAP trapper method
(Carninci, P. and Y. Hayashizaki. 1999. High-efficiency full-length cDNA
cloning. Methods Enzymol 303: 19-44)
c. the spotting protocols ; the array substrate : PCR products,
the spotting buffer : 3*SSC, post-printing processing : rehydration ->
UV cross-linking
d. any additional treatment performed prior to hybridization
; blocking with 1-methyl-2-pyrrolidone and succinic anhydriden
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503 -511.
Bono, H., Kasukawa, T., Hayashizaki, Y., and Okazaki, Y. 2002. READ: RIKEN Expression Array Database. Nucleic Acids Res. 30:211 -213.
Bono, H., Nikaido, I., Kasukawa, T., Hayashizaki, Y., RIKEN GER Group and GSL Members, and Okazaki, Y. 2003. Comprehensive analysis of the mouse metabolome based on the transcriptome. Genome Res. 13: 1345-1349.
DeRisi, J.L., Iyer, V.R., and Brown, P.O. 1997. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680 -686.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95:14863 -14868.
The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I and II Team. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420:563 -573.
The Gene Ontology Consortium. 2001. Creating the gene ontology resource: Design and implementation. Genome Res. 11:1425 -1433.
Grimmond, S.M., Miranda, K.C., Yuan, Z., Davis, M.J., Hume, D.A., Yagi, K., Tominaga, N., Bono, H., Hayashizaki, Y., Okazaki, Y., et al. 2003. The Mouse Secretome: Functional classification of the proteins secreted into the extracellular environment. Genome Res. 13:1350-1359.
Ichikawa, Y., Ishikawa, T., Takahashi, S., Hamaguchi, Y., Morita, T., Nishizuka, I., Yamaguchi, S., Endo, I., Ike, H., Togo, S., et al. 2002. Identification of genes regulating colorectal carcinogenesis by using the algorithm for diagnosing malignant state method. Biochem. Biophys. Res. Commun. 296: 497.
Kadota, K., Miki, R., Bono, H., Shimizu, K., Okazaki, Y., and Hayashizaki, Y. 2001. Preprocessing implementation for microarray (PRIM): An efficient method for processing cDNA microarray data. Physiol. Genomics 4:183 -188.
Marra, M., Hillier, L., Kucaba, T., Allen, M., Barstead, R., Beck, C., Blistain, A., Bonaldo, M., Bowers, Y., Bowles, L., et al. 1999. An encyclopedia of mouse genes. Nat. Genet. 21:191 -194.
Miki, R., Kadota, K., Bono, H., Mizuno, Y., Tomaru, Y., Carninci, P., Itoh, M., Shibata, K., Kawai, J., Konno, H., et al. 2001. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl. Acad. Sci. 98:2199 -2204.
Mizuno, Y., Sotomaru, Y., Katsuzawa, Y., Kono, T., Meguro, M., Oshimura, M., Kawai, J., Tomaru, Y., Kiyosawa, H., Nikaido, I., et al. 2002. Asb4, Ata3, and Dcn are novel imprinted genes identified by high-throughput screening using RIKEN cDNA microarray. Biochem. Biophys. Res. Commun. 290:1499 -1505.
The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. 2001. Functional annotation of a full-length mouse cDNA collection. Nature 409:685 -690.
Spellman, P.T. and Rubin, G.M. 2002. Evidence for large domains of similarly expressed genes in the Drosophila genome. J. Biol. 1:5 .
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. 99:4465 -4470.
0. Special Issue of Genome Research, vol. 13,
no. 6b, pp. 1265-1561 (June 2, 2003).
Report of "RIKEN Mouse Genome Encyclopedia" project: the whole system
from mouse house to database.
1. Carninci P, et al, "Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia", Genome Research, vol. 13, no. 6b, pp. 1273-1289 (June 2, 2003).
2. Numata K , Kanai A, Saito R, Kondo S, Adachi J, Wilming
LG, Hume DA, RIKEN GER Group, GSL Members, Hayashizaki Y, and Tomita M,
"Identification of Putative Noncoding RNAs Among the RIKEN Mouse Full-Length
cDNA Collection", Genome Research, vol. 13, no.
6b, pp. 1301-1306 (June 2, 2003).
1. Saha S, Ansari AZ, Jarell KA, and Ptashne M, "RNA Sequences that Work as Transcriptional Activating Regions".
2. Lee JM, and Sonnhammer ELL, "Genomic Gene Clustering Analysis of Pathways in Eukaryotes".
3. Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M, and Kim SK, "A Global Analysis of Caenorhabditis elegans Operons".
4. Storz G, "An Expanding Universe of Noncoding RNAs".
5. Eddy SR, "Non-Coding RNA Genes and the Modern RNA World".
6. Huttenhofer A, Kiefmann M, Meier-Ewert S, O'Brien J, Lehrach H, Bachellerie J-P, and Brosius J, "RNomics: An Experimental Approach that Identifies 201 Candidates for Novel, Small, Non-Messenger RNAs in Mouse".
7. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription".
8. Frenster JH, "Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".