The one thing you need to know about molecular evolution

KamounLab
7 min readDec 5, 2021

--

The mathematical proof for natural selection is written in DNA in the most striking and beautiful fashion. Let’s find out how.

Cite as Kamoun, S. (2021). The one thing you need to know about molecular evolution. Zenodo https://doi.org/10.5281/zenodo.5759927

When I was on the faculty at THE Ohio State University, the University offered its Professors the opportunity to order a book of their choice to the library and append it with a short note that would inspire undergraduate students. I didn’t hesitate for a moment. I selected Sean B. Carroll’s “The Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution”, one of the most eloquently written book about molecular evolution. One aim of the book is to convince readers that the evidence for evolution through natural selection is hardwired in DNA sequences — what Carroll calls the forensic record of evolution. If we as a society are willing to convict criminals on the basis of DNA evidence, then why is it that we ignore the clear message that DNA tells us about life and how it has evolved.

A must read for biologists at any career stage.

This has resonated with me since at the time I was living in the US Midwest, known as the conservative heartland. Recent history further demonstrated how denial of science, evolutionary biology and all, can have dramatic consequences in society. Sean Carroll’s book is, therefore, a must read for undergraduate students who may be culturally inclined to think that evolution is “just a theory” and that scientists are still debating evolutionary processes such as natural selection. The evidence written in our genomes and those of every living organism should answer any questions about evolution “beyond reasonable doubt.”

The Making of the Fittest is also an inspiring and accessible first class treatise on molecular evolution — a must read for biologists at any career stage. It is shocking to me how little molecular evolution is taught at the undergraduate level. Too often evolutionary biology is taught primarily from the organismal perspective, neglecting the molecular aspects. And when molecular evolution is taught as part of population genetics, an emphasis on complicated mathematical theories can disenchant students.

The single most important fact described in Carroll’s book is how natural selection has left a glaringly obvious signature in DNA. Carroll succinctly takes us through the evidence in the chapter devoted to immortal genes. Let’s revisit this. Many genes are long-lived, meaning that the proteins they encode haven’t changed that much over evolutionary time. Carroll cites Eugene V. Koonin’s list of 500 core gene, also called immortal genes, that are shared by all domains of life, meaning that their proteins have persisted over 2 billion years of evolution before Archaeas, bacteria, plants, fungi, animals and other eukaryotes emerged from their last common ancestor. Carroll shows an alignment of a portion of elongation factor 1a highlighting the amino-acids that are identical across all these domains of life. Just like their proteins, these amino acids are “immortal”, they were present in the last common ancestor of all living organisms and have remained unchanged throughout 2 billion years of evolution. Remarkable, isn’t it?

Closer to home, this last year, Hiroaki Adachi and several of my colleagues reported on how the plant gene ZAR1 is unusually conserved for a gene encoding an immune receptor — a class of genes that are known to be rapidly diversifying. Our analyses revealed that ZAR1 is long-lived, it has probably emerged in the Jurassic era over 150 million years ago early in the evolution of flowering plants. Just like Carroll’s example of elongation factor 1a, we can align ZAR1 from multiple species and show how a great number of amino acids are identical, meaning that they are effectively long-lived residues that have persisted in the ZAR1 protein for tens of million of years.

Long-lived amino acids in the ZAR1 protein of flowering plants are shown in gray. They have persisted throughout 150 million years of evolution.

A key concept here is that these genes are conserved not because they haven’t mutated — they have been bombarded over and over with mutations just like any other gene — but because of natural selection. More precisely, purifying natural selection has eliminated the great majority of mutations that resulted in an amino acid change. This is the molecular biology version of Darwin’s “weeding out the weak.” The beauty of this is that we know that purifying selection has operated on these genes with absolute certainty. We have beyond any reasonable doubt the proof of purifying selection written right there in the genome of every single living organism.

This proof stems from the empirical observation that conserved proteins are more similar in their amino acid sequence than in the nucleotide (DNA) sequence of their genes. This is explained by the nature of the genetic code — the rules used by cells to translate the information encoded in the DNA into proteins. The genetic code happens to be redundant, with its 64 triplets coding for the 20 amino acids and the STOP or termination codons. This redundancy means that many changes in the DNA sequence will not result in amino acid changes, they are what we call silent or synonymous mutations. They change the DNA without affecting the protein sequence, and they are a great telltale of evolution.

The genetic code is redundant.

Mutations, or changes in the DNA sequence, can have two different effects on the protein sequence, they can be silent or they can change the protein sequence. We can easily calculate the odds of whether or not a mutation will result in a change in amino acid. Each of the 64 triplets (codons) can mutate in 9 possible ways resulting in a total of 576 possible outcomes. Of these 576 mutations, 441 would alter the encoded amino acid and are known as nonsynonymous mutations. The remaining 135 are silent (synonymous) mutations. This ratio of ~3 (77%) to ~1 (23%) nonsynonymous to synonymous mutations is the expected ratio from random mutagenesis. This is what you would obtain in average if you took a protein coding DNA sequence and randomly alter it using a pen or a word processor. This is also what you generally obtain when the DNA amplification method PCR goes bad and yields mutated amplicons.

However, when we study genome sequences of living organisms, we repeatedly fail to observe this pattern. You could easily do this exercise. Just pick a conserved gene sequence from two species, human vs mouse, tomato vs. coffee, or two sequences from species of Phytophthora, the infamous plant killer that has triggered the Irish potato famine. Align these two protein sequences and compare their conservation relative to their DNA sequences. Almost invariably, the amino acid sequence will be more conserved than the DNA sequence, and the ratio of nonynonymous to synonymous substitutions will be around 1:3, a very sharp deviation from the 3:1 ratio expected from randomness. In the example below — a conserved protein from two Phytophthora species — this ratio is 6:21 or 1:3.5, not that different from the average you would obtain from a comparison of any two genes, and certainly very different from the random pattern of ~3:1.

Marked bias towards synonymous mutations in the conserved elicitin protein of Phytophthora infestans (INF1) vs. another species Phytophthora parasitica (PARA1).

Again, these genes are bombarded by mutations just like any other DNA sequence, so the reason for this sharp bias in non-synonymous to synonymous mutations is not that they have mutated differently. The reason is that natural selection, more precisely purifying selection, weeded out most nonsynonymous mutations to maintain the purity of these proteins, and therefore resulted in a marked under-representation of nonsynonymous mutations. Probably, most amino acid changes in these proteins perturb their function, and therefore result in an organism with reduced fitness.

I find this forensic evidence of natural selection stunningly beautiful. I do think it is the one single most fundamental fact about molecular evolution that every biologist should know. It’s simple, it’s eloquent and its consequences are immense. The importance of natural selection in shaping life, the view that Charles Darwin, Alfred Wallace and other evolutionary biologists have formulated, is correct. We have its mathematical proof build in the genomes of all living species.

Sadly, my experience is that most of the students and biologists I have queried have never formally learned this fundamental piece of knowledge — how the observed ratio in nonsynonymous to synonymous substitutions massively deviates form the pattern we expect from random mutations. If you’re among those, then add Sean B. Carroll’s book to your holiday reading list.

Appendix

I thank The Sainsbury Lab MSc students for inspiring me to write this post.

My note to the Ohio State students about The Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution (Hardcover) by Sean B. Carroll

This book is a must read for every student at Ohio State. DNA is the blueprint of life. Every organism, every single individual has its own unique DNA code. This DNA code carries all necessary information to build living creatures. Using a simple prose, Carroll elegantly illustrates how DNA contains hidden messages about the history of living creatures and how scientists extract and interpret these messages. DNA typing is routinely used in forensic science and accepted by courts of law throughout the world. We are willing to send convicted criminals to the gas chamber on the sole basis of DNA evidence. So why should we ignore the message DNA tells us about evolution?

--

--

KamounLab

Biologist; passionate about science, plant pathogens, genomics, and evolution; open science advocate; loves travel, food, and sports; nomad and hunter-gatherer.