Molecular Information Theory:
From Clinical Applications To Molecular Machine Efficiency

Dr. Thomas D. Schneider
National Cancer Institute


human donor splice site sequence logo Information theory was introduced by Claude Shannon in 1948 to precisely characterize data flows in communications systems. The same mathematics can also be fruitfully applied to molecular biology problems. We start with the problem of understanding how proteins interact with DNA at specific sequences called binding sites. Information theory allows us to make an average picture of the binding sites and this can be shown with a computer graphic called a sequence logo (

sequence walker for human donor splice junctions Sequence logos show how strongly parts of a binding site are conserved, in bits of information. They have been used to study a variety of genetic control systems. More recently the same mathematics has been used to look at individual binding sites using another computer graphic called a sequence walker ( Sequence walkers are being used to predict whether changes in human genes cause mutations or are neutral polymorphisms. It may be possible to predict the degree of colon cancer by this method.

tiny gumball machine Information theory can also be used to understand the relationship between the binding energy dissipated when two molecules stick together and the amount of sequence conservation of the molecules measured in bits. Using the Second Law of Thermodynamics, this relationship can be expressed as the efficiency of the molecular interaction. Surprisingly, many molecular systems including genetic systems, visual pigments and motility proteins have efficiencies near 70%. A purely geometrical explanation of this result shows that although biological systems are selected to have the highest efficiency, it is restricted to 70% because having precisely distinguishable molecular states is more important.

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab

origin: 2003 July 15
updated: 2006 July 07
color bar

U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA