V Structural and Functional Genomics
Sequencing of Wheat Chromosome 6B: Toward Functional Genomics
Abstract International Wheat Genome Sequencing Consortium (IWGSC) decided to adopt the strategy of chromosome sorting and short read assembly to overcome difficulties of wheat genome sequencing derived from the hexaploid status, the large genome size (about 17 Gb) and high repeat contents (more than 80 %). Our Japanese group was responsible for the sequencing of wheat chromosome 6B. Using DNAs from the flow-sorted chromosome arms, we conducted whole-chromosome shotgun sequencing of chromosome 6B. We sequenced more than 12 million reads obtained from the short and long arms by GS-FLX Titanium, and assembled contigs of 235 Mb for 6BS and 273 Mb for 6BL were generated by GS assembler 2.7 (Roche). These assemblies cover 56.6 % and 54.9 % of estimated sizes of 6BS (415 Mb) and 6BL (498 Mb), respectively. We annotated repetitive regions covering more than 80 % of contigs, 4,798 possible expressed loci, and various kinds of RNA genes using our annotation pipeline. We also found the evolutionary conserved regions among syntenic chromosomes from four grass genomes. For application of the 6B sequences to wheat genomics, various kinds of markers, such as simple sequence repeat (SSR) and insertion site-based polymorphism (ISBP) markers were constructed. Combination of the marker data with the comparative genome analysis will lay a strong foundation of functional genomics of the group-6 chromosomes in wheat.
Keywords Annotation • Chromosome 6B • Genome sequencing • Marker construction • RNA gene • Synteny
Chromosome by Chromosome Sequencing
Completed rice genome sequence in 2004 by International Rice Genome Sequencing Project was the first cereal genome sequence (IRGSP 2005), and then sorghum and maize genome sequences were followed (Paterson et al. 2009; Schnable et al. 2009). For the rice genome sequencing, because BAC by BAC sequencing method using Sanger sequencing was adopted, the sequencing accuracy was less than one error in 10 kb. This accuracy was validated by genome resequencing by Next Generation Sequence (NGS) data (Kawahara et al. 2013), and it showed that the rice genome sequence is the most accurate one in the cereal genomes sequenced so far. Sorghum genome was determined by whole-genome shotgun sequencing method and maize genome sequence was achieved by the combination of the minimum tiling path (MTP) method and BAC by BAC sequencing. However, their genome sequences were less accurate than rice genome and were still fragmented into many scaffolds.
In Pooideae, Brachypodium distachyon Bd21 genome was sequenced in 2010 by whole-genome shotgun sequencing method, because of its small genome size (272 Mb) (The International Brachypodium Initiative 2010). However, compared with the B. distachyon genome sequencing, the sequencing of other Pooideae genomes, such as wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.) has fallen behind due to the complexity of their genome structures. First, the wheat and barley genome size was 17 and 5.1 Gb, respectively. They are more than 40-times and 13-times larger than rice genome. Second, repeat regions occupied more than 80 % of the genome hamper their genome assembly. Third, in particular, since wheat is a hexaploidy, it is quite hard to distinguish homoeologous sequences from A, B and D genomes.
To overcome these problems, the various new methods were applied. NGS technology enables us to assemble large sized genome with the low cost. Even if the NGS read length is several hundred bp, millions of NGS reads can be used in one analysis (the total read length is up to several Gb) so that assembly of large genomes can be conducted. For the barley genome sequencing, BAC by BAC sequencing and NGS sequencing methods were combined, and then 1.9 Gbp of the sequences were released in 2012 (The International Barley Genome Sequencing Consortium 2012). In 2013, the genome sequences of Aegilops tauschii and T. urartu were determined (Jia et al. 2013; Ling et al. 2013; Luo et al. 2013). Wheat genome was also sequenced by whole-genome shotgun technology with NGS data (Brenchley et al. 2012). However, because of the hexaploidy, whole genome assembly was not achieved as same as other diploid genomes of Triticeae.
To solve the genome complexity, chromosome sorting by flow cytometry was developed in cereal genomics (Doležel et al. 2007). This method can reduce sample complexity, such as the hexaploid status of the wheat genome, therefore International Wheat Genome Sequencing Consortium (IWGSC) decided to apply this technologies to their activity. Single chromosomes or chromosome arms were sorted by the flow cytometric analysis and chromosome (arm)-specific BAC libraries were constructed. Progress of physical map construction and genome sequencing of each chromosome and chromosome arms can be seen on the IWGSC website ( wheatgenome.org/) and URGI wheat portal site (wheat-urgi.versailles. inra.fr/).