SNP Selection and Marker Design
To identify loci linked to a trait we use BSA and extend it by using RNA-Seq instead of targeting known markers. We sequence the parental genotypes (Avocet S and Avocet S + Yr15, Fig. 22.1a, b) and generate a consensus reference by aligning the reads to the UniGenes and gene models described above (Krasileva et al. 2013). Although genome-specific references are used when possible, there are still cases where multiple homoeologues will align to a common reference (as illustrated in Fig. 22.2b). These homoeologous variants (exemplified by the G > T variant at position 181; K in consensus) will generate ambiguity codes within both parental consensus sequences and can therefore be excluded. Real allelic SNPs between the parental genotypes (exemplified by the G >A variant at position 184; R in consensus) are distinguished by the presence in one, but not the other parental consensus sequence (Fig. 22.2b).
These allelic SNPs are then examined further in the alignments of the RNA-Seq reads from the susceptible and resistant bulks. To identify enriched and depleted SNPs we first calculate the SNP Index, which is the proportion of the non-consensus base at the positions previously identified (A at position 184 according to the example in Fig. 22.2b; Takagi et al. 2013a). Then, we calculate the bulk frequency ratios (BFR), which is the ratio between the SNP Indexes of the resistant and susceptible bulks (Trick et al. 2012). The BFR helps reduce the noise generated by differential expression of homoeologous genes and accounts for the presence of alternative bases at any given position. A closely linked SNP to the R-gene should generate a very high BFR since the resistant bulk will carry exclusively plants with the resistant allele, whereas the susceptible bulk should be devoid of any plants carrying the resistant allele. As one moves further away from the R-gene, recombination events occur between the gene and the candidate SNPs decreasing the BFR.
Once a list of SNPs with the corresponding BFRs is obtained, these can be prioritized in several ways before independent validation as genetic markers. First and foremost is the BFR value itself; the higher the BFR the most likely the SNP is genetically linked to the gene. Second, we can align the candidate genes to the genome of syntenic species such as Hordeum vulgare, Brachypodium distachyon and Oryza sativa to identify the orthologous genes and start to identify syntenic regions with high BFRs. More recently, we have started to align the genes to the CSS assemblies to locate the chromosome arms with highest BFR. The putative SNPs with enriched BFR can then be converted into HTP SNP assays (see next section) and genotyped across the individuals that were used to assemble the bulks. This generates a genetic map with markers across the R-gene locus. If the interval is syntenic to one of the sequenced grass genomes, an additional round of SNP selection can be performed based on synteny and using slightly more relaxed parameters to establish the BFR cut-off.
A key issue is to move from in silico SNPs into a HTP SNP assay that can be amenable to both researchers and wider applications such as marker assisted selection (MAS) in breeding programs. In this regard, it is particularly important to have genome specific SNP assays that allow the screening of germplasm where heterozygotes need to be easily identified. The IWGSC CSS assemblies facilitate this objective by allowing us to generate a multiple alignment of the reference sequence containing the SNP of interest with the assemblies from the three homoeologous genomes (Fig. 22.2c). In this manner, homoeologous SNPs can be readily identified and incorporated into the primer design to assay the SNP only in the genome of interest. This is particularly applicable to end point fluorometric assays, such as
KASP (Allen et al. 2013), that identify SNPs using two differentially labeled primers with the SNP at the 3′ end of the primers (boxed sequences with SNP in bold; Fig. 22.2c). A common reverse primer can be designed across a homoeologous SNP to generate the genome specific amplification. This results in a HTP assay that can be readily used to genotype heterozygous individuals (Fig. 22.2d). In the past, the generation of genome-specific assays has been a time consuming task (Chao et al. 2008) but with the use of this pipeline the creation of allele specific SNPs has been simplified. We developed a web-based user-friendly interface (PolyMarker, Ramirez-Gonzalez et al. 2015a) to make this pipeline readily available to the community (polymarker.tgac.ac.uk).
Despite the complexity of the wheat genome and the draft status of the genomic reference, it is possible to advance in the development of NGS-enabled genetic approaches in hexaploid wheat. RNA-Seq of NILs facilitates the identification of SNPs and helps distinguish these as informative allelic SNPs or non-informative homoeologous variants. RNA-Seq of NIL-derived F2 bulks helps identify those SNPs that are most closely linked to the phenotype of interest. With the aid of the recently released CSS, putative SNPs can be rapidly converted into HTP assays that can be incorporated into MAS improvement programs by breeders (RamirezGonzalez et al. 2015b). These approaches will continue to improve in resolution as physical maps improve in wheat and homoeologous relationships between transcripts are more precisely defined.