The surprising amount of DNA sequence variation discovered as increasing data became available led to a more sophisticated model to infer ecological processes from population genetic data: the neutral theory. The theory states that most genetic variants are neutral with respect to natural selection and thus genetic variation can be modeled as a mutation-drift equilibrium resulting from finite population size. To determine if genetic variation in a sample fits the predictions of the neutral theory, one would calculate the expected genetic diversity for a set of DNA sequences under the assumptions of the neutral theory and compare it to the observed diversity in DNA sequence. One well known test for neutrality is Tajima’s D based on the difference between two parameters, the number of SNPs, S, and the average number

. . dfj

of pairwise differences, n = _{n(n} 2^{j} . Under the neutral theory, n and S/а are expected to be equal, where a is a scaling factor (for a full explanation, the reader is referred to any population genetics textbook^{41}). The difference, normalized by the standard deviation, VV, is Tajima’s D:

If the sample reflects mutation-drift equilibrium as predicted by the neutral theory, D = 0. Under purifying selection, mutations will accumulate at third position codons, but are not likely to be common resulting in multiple low frequency SNPs, and D , 0. After a recent population expansion a similar pattern is expected. New mutations occur and because the population is expanding such mutations will persist, but will not be very common. In contrast, for a population bottleneck or either heterozygote advantage (for diploid nuclear genes) or balancing selection (for nuclear or mitochondrial genes), D > 0.

Statistical significance of D is determined by a ^-distribution.^{49} Online interfaces (http://wwwabi.snv.jussieu.fr/achaz/neutralitytest.html) as well as software such as DnaSP^{48} are available for calculating Tajima’s D and its statistical significance, as well as other tests of the neutral theory.

We briefly outline how DNA sequence data can be used to infer evolutionary and ecological processes with a subset of the data from the T. infestans mitochondrial cyt b gene.^{32} The data, seven sequences from Yamperez, Bolivia with two haplotypes and three SNPs (Table 8.3), address the questions: Is the observed variation different from that expected by the neutral theory? If so, what might be causing the difference?

Table 8.3 The three SNPs in 411 nucleotides of cyt b in T. infestans from Yamparez, Bolivia

Nucleotide position relative to 1131

537

741

789

No. haplotypes in

nucleotide cyt b gene

Yamperez

Haplotype A

C

G

G

6

Haplotype B

T

A

A

1

The number of segregating sites, S = 3, the scaling factor, a = P”=/ 1 = у + 1 + 3 + 1 + 5 + 1 = 2.450, and a = 1.224. To calculate n we compare all possible pairs of the seven sequences ((7/2) = 572f = 21 possible comparisons). For the 15 comparisons of a haplotype A with another haplotype A, there are no differences, dj = 0, and for the six comparisons of haplotype A with haplotype B with three SNPs, dj = 3, therefore:

Using the online interface, pV = 0.270, D = 1.812, and 0.10 > P > 0.05. The small, almost significant value of D hints at balancing selection or a population bottleneck.

Using DNA sequence data to test the neutral theory expanded the field of theoretical population genetics, which now includes the use of mathematical, statistical, and computational models investigating and predicting quantities and patterns of genetic variation resulting from mutation, changes in population size, migration, selection, and nonrandom mating acting either individually or in combination.