Evolution 101 — Stop teaching most mutations are neutral?

The neutral theory has served as a useful null hypothesis in the field of molecular evolution. But is it necessary to teach “most mutations are neutral” when introducing molecular evolution to students.

Cite as Kamoun, S. (2022). Evolution 101 — Stop teaching most mutations are neutral?. https://doi.org/10.5281/zenodo.5818005

A recent Twitter thread by T. Ryan Gregory, Professor of evolutionary biology at the University of Guelph, Canada, prompted me to think again about how we teach molecular evolution to students, notably molecular biologists. Gregory’s tweets aren’t particularly controversial as evident from their popularity and the comments they triggered. I suspect, most evolutionary biology courses will start with something similar to the thread’s first tweet.

This said, the statement “most (mutations) have no effect” made me pause and prompted me to tweet a few days ago whether it should be part of an introduction to the complex topic of molecular evolution. This is Evolution 101 after all. Presumably, this is the student’s first exposure to the concepts of molecular evolution. My understanding is that “most mutations are neutral” is still under debate and doesn’t warrant being framed as rock solid. This 2018 Quanta Magazine article nicely summarizes the challenges that Motoo Kimura’s influential neutral theory is facing in the genomics era.

Is genetic drift the principal driving force in evolution?

The neutral theory came at a time when adaptationists dominated evolutionary biology and it helped highlight the importance of genetic drift as another force of evolution in addition to natural (Darwinian) selection. It certainly helped balance the debate, and the importance of randomness (stochasticity) in evolution cannot be questioned anymore. But this is a far cry from viewing genetic drift as the principal evolutionary force. And the view that most mutations are neutral is essentially what leads to the corollary “the evolutionary fate of genetic variation is best explained by randomness.

Mutations can be neutral or they can be affected by negative or positive selection. The proportion between these different classes of mutations hasn’t been fully sorted out by scientists, and can be difficult to ascertain because selection isn’t constant and varies over time.

Neutral mutations can also hitchhike with genetically linked mutations that are themselves under selection. Therefore, the frequency of neutral mutations is not only shaped by random genetic drift but can also be affected by their location in the genome and the degree to which the locus experiences recombination. The hitchhiking effect means natural selection indirectly affects the frequency of genetically linked “neutral” mutations until recombination splits them apart.

We want students to be able to connect theory to real world data. To take a popular current example, the evolution of SARS-CoV-2 — COVID-19 causal agent — is clearly shaped by natural selection, i.e. increased infectivity, transmissibility, and immune evasion. Therefore, the patterns of genome diversity of SARS-CoV-2 are not principally determined by random genetic drift — at least over the two years since the pandemic started. Initially, there was some reluctance to admit this because population geneticists are trained to assume neutrality unless proven otherwise. It took the dramatic emergence of the Alpha, Delta, and Omicron variants to see a broad appreciation of the power of selection in shaping the evolutionary dynamics of the coronavirus.

The rise and fall of SARS-CoV-2 variants in the United Kingdom. From OurWorldInData.org/Coronavirus (January 3, 2022).

One can only wonder about the degree to which natural selection has also affected other variants of SARS-CoV-2, but at a smaller and more local scale than these global and dominant variants. For example, the D614G polymorphism in the spike protein of SARS-CoV-2 was shown early in the pandemic to increase infectivity, but there was some reluctance to accept that it can also increase transmissibility despite the experimental data.

We need to keep a more open mind to the prevalence of selection in driving evolution. I was told more than once that an example like SARS-CoV-2 is atypical because it involves the extreme selection pressures noted in host-parasite interactions. Is this really so exceptional. Every single organism is continuously engaged in host-parasite coevolution with selection forces fluctuating widely. Why should we push these interactions apart just to satisfy a theoretical framework?

The neutral theory as the baseline assumption of molecular evolution

Perhaps it is best to view the neutral theory as a useful null hypothesis of molecular evolution rather than an accurate description of genetic variation. It’s notable that in The Making of the Fittest, Sean B. Carroll introduces molecular evolution concepts in a comprehensive and lucid way, yet he does so with only a single mention of the neutral theory. When he mentioned it, the focus was on the importance of Kimura’s theory as a baseline assumption against which we can search for signatures of natural selection:

“Evolutionary biologists once thought that all changes in molecules came about through selection, but the late Motoo Kimura proposed in the 1960s that much molecular change was selectively neutral. The power and importance of Kimura so-called Neutral Theory is that it provides a baseline assumption about how DNA should vary and change as a function of time, if no other force intervenes. When measurements of change deviate from what is expected by neutrality, that is an important signal — a signal that selection has intervened. That signal may reveal that selection has favored some specific change, or that it has consistently rejected others.”

But even if we constrain the neutral theory to a null hypothesis, we should cautiously interpret any failures to detect selection. As we will see next, the absence of evidence is not necessarily evidence of absence.

The experimental evidence for neutrality

David Bryant Lowry, an Assistant Professor at Michigan State University, posted in reply to my tweet: “Experimental evolution studies in bacteria, viruses, and worms generally support the assertion that most mutations are neutral or near-neutral.” The problem with such studies, in my view, is that they are inconclusive. How many environmental conditions were tested? Does this reflect the real world where selection fluctuates over time and is far from being constant?

It turns out we know for a fact that studies that fail to assign a fitness effect to mutations are not necessarily representative of the real world. Kresten Lindorff-Larsen, Professor of Biophysics at the University of Copenhagen, pointed out a series of very interesting mutation studies in yeast. These studies involved the highly conserved protein ubiquitin, which as its name suggests, is found across all eukaryotic organisms. This means that ubiquitin has remained mostly unchanged throughout over 2 billion years of evolution, indicating that purifying selection has eliminated most mutations that resulted in amino acid changes. What Lindorff-Larsen pointed out is that in deep mutational scans of ubiquitin in yeast failed to assign a fitness effect to most amino acid changes even though they are highly conserved throughout evolution. In other words, laboratory experiments suggest that most amino acid changes in yeast ubiquitin are neutral, but clearly they carry some sort of fitness penalty when evolving in the natural world. Essentially, the laboratory experiments are inconclusive — they failed to detect selection, but the default assumption that they are neutral is incorrect.

The coding sequence dilemma

As the ubiquitin example shows, coding sequence pose a serious challenge to the view that most mutations are neutral. As I wrote in an earlier post, there is unambiguous evidence for purifying selection in coding sequences. Because of the degeneracy of the genetic code, we know what the theoretical ratio of nonsynonymous (the mutations which change the amino acid of the corresponding protein) to synonymous (silent) mutations should be 3:1. Yet the observed ratio is more like 1:3 indicating a massive under-representation of nonsynonymous mutations in coding sequences compared to theoretical predictions. Presumably, just like in the ubiquitin example, these amino acid changes caused enough perturbation to protein function that they have been eliminated by Darwinian selection.

A good metaphor is survivorship bias, a statistical concept made famous by the mathematician Abraham Wald analysis of the impact of anti-aircraft fire on World War II B-29 bombers. Essentially, if you only survey the surviving aircrafts then you will only examine a sub-sample of where the bullets hit the planes. For example, planes that got hit in their engines just crashed and didn’t make it back to base so were not computed in the survey depicted below.

The red dots depict where bullet holes were detected in surviving B-29 airplanes.

Just like the missing bullet holes, we know for a fact that there is a large fraction of missing nonsynonymous mutations in coding sequences. They are absent because they have been eliminated by purifying selection. These mutations are not neutral.

Now I can hear some arguing that coding sequences only form a small fraction of the genome. Yes, that’s true for eukaryotes. But in bacteria and archaea, only ~10% of the genome is non-coding. And what about viruses where genes tend to be tightly packed in the genome. Why should we ignore such a large fraction of the biota and teach concepts that are heavily biased towards eukaryotic model systems.

If we acknowledge that in most genes, a significant fraction of nonsynonymous mutations are under purifying selection. And that an important section of the biota consist of species with genomes that are mostly filled with coding sequences. Then the statement that most mutations are neutral doesn’t reflect the genome data across the diversity of life.

Concluding thoughts

The point of my tweet and post is to raise the question of whether it is appropriate to start an introductory lesson on evolution with “most mutations are neutral.” Besides the fact that the statement is vague — what fraction is most — and still under debate, there is the issue that this can confuse students when they are exposed to real world population genomics data and when they study coding sequences.

To wrap up, here is a draft of my attempt at an introductory abstract on molecular evolution. This is work in progress, and your feedback is most welcome.

Mutations happen in genomes all the time. This is generally viewed as an overall random process but it can be biased by many factors, including intrinsic factors such as genome environment, and external factors like mutagens. Mutations can be neutral or they can be affected by negative or positive selection. The proportion between these different classes of mutations hasn’t been fully sorted out by scientists, and can be difficult to ascertain because selection isn’t constant and varies over time — sometimes dramatically as in host-parasite coevolution. Nonetheless in coding sequences, there is very strong evidence of purifying selection against a large fraction of nonsynonymous mutations (the mutations which change the amino acid of the corresponding protein). There is massive under-representation of nonsynonymous mutations in coding sequences compared to theoretical predictions, and these have perturbed protein function enough to have been eliminated by natural selection.

Addendum

There were additional tweets and comments that add to the discussion but that I haven’t had time to fully digest. Interested readers can check from the replies to the tweets below.

Biologist; passionate about science, plant pathogens, genomics, and evolution; loves travel, food, and sports; nomad and hunter-gatherer.