Allele frequency-based and polymorphism-versus-divergence indices of balancing selection in a new filtered set of polymorphic genes in Plasmodium falciparum.
Ochola LI., Tetteh KKA., Stewart LB., Riitho V., Marsh K., Conway DJ.
Signatures of balancing selection operating on specific gene loci in endemic pathogens can identify candidate targets of naturally acquired immunity. In malaria parasites, several leading vaccine candidates convincingly show such signatures when subjected to several tests of neutrality, but the discovery of new targets affected by selection to a similar extent has been slow. A small minority of all genes are under such selection, as indicated by a recent study of 26 Plasmodium falciparum merozoite-stage genes that were not previously prioritized as vaccine candidates, of which only one (locus PF10_0348) showed a strong signature. Therefore, to focus discovery efforts on genes that are polymorphic, we scanned all available shotgun genome sequence data from laboratory lines of P. falciparum and chose six loci with more than five single nucleotide polymorphisms per kilobase (including PF10_0348) for in-depth frequency-based analyses in a Kenyan population (allele sample sizes >50 for each locus) and comparison of Hudson-Kreitman-Aguade (HKA) ratios of population diversity (π) to interspecific divergence (K) from the chimpanzee parasite Plasmodium reichenowi. Three of these (the msp3/6-like genes PF10_0348 and PF10_0355 and the surf(4.1) gene PFD1160w) showed exceptionally high positive values of Tajima's D and Fu and Li's F indices and have the highest HKA ratios, indicating that they are under balancing selection and should be prioritized for studies of their protein products as candidate targets of immunity. Combined with earlier results, there is now strong evidence that high HKA ratio (as well as the frequency-independent ratio of Watterson's /K) is predictive of high values of Tajima's D. Thus, the former offers value for use in genome-wide screening when numbers of genome sequences within a species are low or in combination with Tajima's D as a 2D test on large population genomic samples.