head of Tyrannosaurus Rex The AND-Multiplication Error

Thomas D. Schneider, Ph.D.

The multiplication rule. It is well known from elementary probability theory that if two events are independent, then we may multiply the probabilities of each event to determine the probability of having both events occur. Suppose that there are two events A and B, with the probability of occurrence for A being PA and the probability of occurrence for B being PB. Further assume that the events neither influence each other nor do they both have a common source of influence. Then the probability that both events occur is PA x PB. That is, when the events A and B are independent, the probability that event A AND event B both occur is found by multiplying the probabilities of the individual events.

Example: A single die is a cube having 6 numbered faces, numbered by dots. The probability of getting a single dot from an unbiased die is 1 out of 6 or 1/6. The probability of getting two dice each to have one dot (snake eyes) is 1/6 multiplied by 1/6 or 1/36 = 0.028. Consider now a case where the two dice are glued together so that on one side there is a snake eyes. We toss the two and only read them if they don't end up on the end as a stack one on top of the other. Since there are 4 sides, the probability of 'snake eyes' is 1/4 = 0.25. The non-independence dramatically increased the probability of the event nine fold.

The multiplication rule does not apply to biological evolution. A common error in the non-scientific literature and poorly written papers is to assume that probabilities multiply for computing components of living things such as proteins. A typical argument notes that proteins are about 300 amino acids long and that there are 20 different kinds of amino acids. If such a string were to be generated using independent selection of the amino acids, then the probability of generating any particular string is 20-300, a very small number indeed. While this may be true for random strings, it does not directly apply to proteins found in living organisms. Why? Because individual mutations accumulate one-at-a-time and there is amplification (replication) between steps. That is, if one starts with a given amino acid string, the mutations in the genome (from which the string is derived) are sequential. A mutation occurs, perhaps changing the amino acid string. If the change is bad, which is true for the majority of changes, the organism dies and its genes are gone. (In diploids, recessive defects will be removed more slowly since they are only exposed when an organism becomes homozygous for the mutation.) If a rare lucky change occurs that has some advantage (or at best is neutral or only slightly deleterious) then the organism may survive to produce offspring. head of Tyrannosaurus Rex The possibility of appearance and acceptance (by natural selection processes) of mutations in the offspring therefore depends strongly on whether the previous generation survived and on the number of progeny. Genetic algorithm experiments, such as the Ev evolution program demonstrate clearly that the probability of generating what would be an extremely rare genetic string if the steps were independent, can be high. So the evolution of a 300 amino acid protein is reasonably easy to attain.

A concrete example. Suppose we have 10 coins that land as 'heads' or 'tails' after they are all flipped at once in parallel. The probability of getting all heads is (1/2)10 = 1/1024. The probability of not getting all heads in a single parallel flip is 1-1/1024 so the probability of not getting all heads after F parallel flips is (1-1/1024)F. After a number of flips F, the probability of finally getting all heads is

      1 - (1-(1/1024))F.
For example, after 1024 tries the chance of getting all heads at least once is only 1 - (1023/1024)1024 ≈ 63.2%. So it could take quite a while to get all heads!

close up photograph of a spherical Dandelion seed head But that is not what happens in nature. To model what happens in natural biological systems, consider flipping all 10 coins at once. Initially there will be about 5 heads and 5 tails. We paste these to an index card. We then make 100 copies of the card, including the states of the 10 coins. While we make the copies of the coin states, we sometimes make an error, changing a head to a tail or a tail to a head. We then find the card that has the most coins with heads up and we throw away all the other cards. So if even one card has an extra head, it will be found. We reproduce that card 100 times (with errors) and repeat the selection. Suppose that we make an error in copying a coin state about 1 time in 100. Then almost every other generation we will get another head. Starting from about 50% heads, it will only take 10 generations to get a card with all heads. That is what happens in nature. Notice that we have wasted a lot of cards, coins and glue to get the all-head card - about 1000 sets! - but the result comes quickly. (If the coin is a penny, the cost is $100.) Consider the dandelion. It creates many progeny. Many fall on the wrong ground or are perhaps eaten. But the ones that get through can repopulate your entire yard!

Summary. It is inappropriate to multiply probabilities unless the two events are independent. One must account for all of the events (in other words, honor the dead). The functional amino acids in a protein are not obtained independently since many organisms die for the few that survive to reproduce. Each change to an amino acid occurs in the context of the current protein and therefore depends on the previous history of the protein. Although the amino acids may be functionally independent (allowing, for example, the computation of a sequence logo), the appearance of the selected amino acids is sequential during evolution and is, therefore, dependent on previous steps. It is invalid to directly apply the multiplication rule to computing the probability that proteins came into existence.

Links

Documents that make the AND-multiplication error and use it to draw conclusions are flawed to the core and their conclusions can be immediately dismissed as invalid.

Below are examples of documents that make the AND-multiplication error, listed alphabetically by author.


2006 Jul 19: Mark Hancock pointed out this statement, which used to be in the text above, is not right: "It would take about 1024 tries to get all heads." He said: "After 1024 tries, you'd actually only have a 1 - (1023/1024)^1024 ~= 63.2% chance of getting all heads (at least once) in the case that you flip 10 coins every time." Thanks for the correction!

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers


Schneider Lab

origin: 2004 Sep 17
updated: 2011 Aug 16
color bar


U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA