IV Toward Whole Genome Sequencing
Comprehensive Functional Analyses of Expressed Sequence Tags in Common Wheat
Abstract Although shotgun sequences of the genomic DNA of common wheat and its ancestors are available, gene discovery in common wheat is primarily based on proof sequencing of expressed full-length (FL) cDNAs. Use of expressed sequence tag (EST) databases including FLcDNA has been recognized as an important method for gene annotation in common wheat. In the large repetitive genome of common wheat, a transcriptome approach is complementary to whole genome sequencing. We have initiated a wheat EST project in Japan and constructed cDNA libraries from various tissues and strains of wheat, including biotic and abiotic stress treatments. We have also generated a high quality full-length cDNA resource for common wheat, an essential element necessary for the ongoing curation and annotation of the wheat genome. After several rounds of screening of CAP-trapped cDNA libraries, 21,408 FLcDNAs have been fully sequenced. The origins of these FLcDNAs were estimated through examination of the RNAseq data of three ancestral diploids, namely, Triticum urartu, Aegilops speltoides, and Aegilops tauschii. In addition, 51 cDNA libraries were constructed with an accumulation of 0.9 million ESTs. The ESTs, including the FLcDNA data, were assembled into contigs with stringent bioinformatic tool parameters. In total, 41,003 gene clusters were classified, in which 27,943 (68.1 %) had homology with other cereal genes. The digital monitoring system was utilized to identify characteristic gene expression patterns among various tissues and stress treatments in common wheat. These transcriptome data comprise a substantial reference for wheat genome sequencing.
Large-Scale Collection of Genes Expressed in Common Wheat
Wheat is characteristically polyploidic in nature and harbors large complex genomes. Therefore, accumulation of expressed sequence tags (ESTs) for wheat is particularly important for enabling functional genomics and molecular breeding studies. We obtained large collections of ESTs from various tissues in the wheat life cycle and from tissues subjected to stresses. Since full-length cDNAs are indispensable for certifying the ESTs collected and for annotating genes present in the genome, we performed a systematic survey of and sequencing of full-length cDNA clones. The strategy for the collection of ESTs in common wheat is shown in Fig.
10.1. First, total RNAs were extracted from the ten tissues over the course of the wheat life cycle. Subsequently, RNAs were extracted from biotic and abioticstressed tissues. The cDNA libraries were constructed from these RNAs by using standard methods. Colonies were randomly picked and sequenced from both ends. At present, 894,756 unbiased ESTs are available. More than 1.2 million wheat ESTs
Fig. 10.1 MUGEST: Wheat EST project in Japan
are registered in the NCBI EST database, 70 % of which were contributed from Japan. For full-length cDNAs, the cDNA library was constructed with the CAPtrapper method (Kawaura et al. 2009). After one-path sequencing of cDNA clones, independent clones were selected three times, and the inserts of the cDNA clones were verified. Finally, 22,519 sequence-verified full-length cDNAs were obtained.