color bar
animated mouse

The Consensus Sequence
Hall of Fame

by Tom Schneider
molecular information theorist
the words consensus sequences with a surrounding red
circle and red back slash

The concept of a consensus sequence is pervasive in molecular biology. It is a primitive concept that can and should be replaced by two concepts:

A poster describes how sequence logos and walkers form a complete replacement for consensus sequences.

This paper discusses the issues in detail:

T. D. Schneider, Consensus Sequence Zen,
Applied Bioinformatics, 1 (3), 111-119, 2002.

See the oxyr, baseflip and repan3 papers for the science you WILL MISS if you use consensus sequences!

Given that these concepts are well documented, already a good number of years ago, and given that consensus sequences are dangerous to use, scientific publications dated on or after 2001 May 1 that use the concept of a consensus sequence, when a sequence logo or a sequence walker would have been appropriate, will be placed on this page.

  1. R. Swem, Sylvie Elsen, Terry H. Bird, Danielle L. Swem, Hans-Georg Koch, Hannu Myllykallio, Fevzi Daldal, Carl E. Bauer, The RegB/RegA Two-component Regulatory System Controls Synthesis of Photosynthesis and Respiratory Electron Transfer Components in Rhodobacter capsulatus. Journal of Molecular Biology, 309(1): 121-138 (doi:10.1006/jmbi.2001.4652), May 2001. "This high degree of sequence conservation raises the issue of whether or not there is a consensus DNA-binding site that is recognized by RegA homologs ..." This case demonstrates confusing the model (a consensus sequence) with "reality" (a DNA-binding site). new as of 2001 May 21.

  2. Patel PH, Suzuki M, Adman E, Shinkai A, Loeb LA. Prokaryotic DNA polymerase I: evolution, structure, and "base flipping" mechanism for nucleotide selection. J Mol Biol. 2001 May 18;308(5):823-37. Figure 3 gives MOTIF B as

    but the text gives
    Likewise, Figure 3 gives MOTIF C as
    xhxx but the text gives says it contains
    E! This is a clear demonstration of how one can get into trouble. new as of 2001 May 22.

  3. The crystal structure of a heptameric archaeal Sm protein:
    Implications for the eukaryotic snRNP core.
    Mura C, Cascio D, Sawaya MR, Eisenberg DS.
    Proc Natl Acad Sci U S A. 2001 May 8;98(10):5532-7. new as of 2001 May 23.

  4. Proc Natl Acad Sci U S A 2001 Aug 28;98(18):10085-9 Human DNA replication initiation factors, ORC and MCM, associate with oriP of Epstein-Barr virus. Chaudhuri B, Xu H, Todorov I, Dutta A, Yates JL. Pubmed. "ORC was first discovered in budding yeast as a six-member complex that binds in an ATP-dependent manner to a consensus sequence present at yeast replication origins (30)." Of course ORC binds to binding sites that are modeled by a consensus. It does not bind to the model! new as of 2001 Oct 2

  5. Science 2001 Nov 2;294(5544):1098-102 Structural basis for recognition of the intron branch site RNA by splicing factor 1. Liu Z, Luyten I, Bottomley MJ, Messias AC, Houngninou-Molango S, Sprangers R, Zanier K, Kramer A, Sattler M. Pubmed. new as of 2001 Nov 8

  6. Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum Mol Genet. 2002 Feb 15;11(4):451-64. Clark F, Thanaraj TA. Pubmed PMID: 11854178. The authors used sequence logos thumb pointing up with a green sleeve but used the term 'consensus sequence' all over the place and did not use the individual information or walker methods. new as of 2002 March 28

  7. Biochem Biophys Res Commun 2002 Mar 8;291(4):979-86 Evidence for "pre-recruitment" as a new mechanism of transcription activation in Escherichia coli: the large excess of SoxS binding sites per cell relative to the number of SoxS molecules per cell. Griffith KL, Shah IM, Myers TE, O'Neill MC, Wolf RE Jr. Pubmed PMID: 11866462. new as of 2002 April 23.

  8. J Virol 2001 Jun;75(12):5567-75 Retroviral constitutive transport element evolved from cellular TAP(NXF1)-binding sequences. Zolotukhin AS, Michalowski D, Smulevitch S, Felber BK. Pubmed PMID: 11356964. new as of 2002 May 16

  9. Science 2002 May 10;296(5570):1132-6 A complex with chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Ogawa H, Ishiguro K, Gaubatz S, Livingston DM, Nakatani Y. Pubmed PMID: 12004135 . new as of 2002 May 16

  10. Rebeiz M, Reeves NL, Posakony JW. SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Proc Natl Acad Sci U S A. 2002 Jul 23;99(15):9888-93. PMID: 12107285. new as of 2002 July 31

  11. James LC, Roversi P, Tawfik DS. Antibody multispecificity mediated by conformational diversity. Science 2003 Feb 28;299(5611):1362-7 PMID: 12610298. See figure 4. new as of 2003 March 12

  12. Lavrrar JL, McIntosh MA. Architecture of a Fur binding site: a comparative analysis. J Bacteriol. 2003 Apr;185(7):2194-202. PMID: 12644489. The abstract starts off: "Fur ... recognizes a 19-bp consensus site ..." This is Confusing a model with reality. new as of 2003 May 25

  13. Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M. Science. 2003 Jul 4;301(5629):71-6. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. PMID: 12644489. Lots of "sequence motifs" - a terrible model!! new as of 2003 May 25

    Prog Nucleic Acid Res Mol Biol. 1990; 38: 137-64. Structure-function relationships in Escherichia coli promoter DNA. Horwitz MS, Loeb LA. PMID: 2183292. new as of 2004 Jan 22
    On page 141 of this paper the authors state:

    For promoters of sigma70 holoenzyme, a general rule is that mutations that decrese a promoters's agreement with the consensus decrease the frequency of transcription initiation, and mutations that increase consensus agreement increase initiation frequency (20). A prediction of this rule is that the consensus sequence represents the most effecient promoter.
    So far so good. This should be true according to individual information theory. They continue:
    The correctness of this rule is demonstrated by the success of an algorithm correlating consensus agreement and promoter strength in vitro (46).
    This should be improved upon by using matrix methods. They continue:
    Exceptions to this rule are infrequent, but may be of significance. One example is the promoter for the arabinose operon araBAD, in which an A-to-G change at -33 decreases consensus homology while increasing promoter strength (47).
    This is a controvertial proposal. Is it correct? There are two flaws in the above quoted paragraph:
    1. Reference 47 (J Bacteriol. 1980 May; 142(2): 659-67. Deoxyribonucleic acid sequence of araBAD promoter mutants of Escherichia coli. Horwitz AH, Morandi C, Wilcox G.) shows that the araXc mutation is at -35, not -33.
    2. The relevant sequences are:
      wild type TGACGC
      mutant TGGCGC
      consensus TTGAGA
    The mutant brings the sequence closer to the (supposed) consensus, so the claim is incorrect.

    So far as I know, there are no clearly demonstrated cases that violate the relationship that higher information ("closer to consensus") has a stronger binding energy.

  15. Murakami KS, Masuda S, Darst SA. Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 A resolution. Science. 2002 May 17;296(5571):1280-4. PMID: 12016306. new as of 2004 Jun 8 The bacterial RNAP holoenzyme forms an initial closed promoter complex by recognizing two hexamers of consensus DNA sequence ..." Nope, they recognize a lot more sequences than just the consensus!

  16. Grifantini R, Sebastian S, Frigimelica E, Draghi M, Bartolini E, Muzzi A, Rappuoli R, Grandi G, Genco CA. Identification of iron-activated and -repressed Fur-dependent genes by transcriptome analysis of Neisseria meningitidis group B. Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9542-7. PMID: 12883001 . new as of 2004 Jun 29. Fur forms a dimer together with ferrous iron and binds to a consensus sequence (Fur-box) that overlaps the promoters of iron regulated genes, resulting in the inhibition of transcription. This is a good example of confusing a model with the reality. The guilty party in this case is the FINDPATTERN program, part of the Wisconsin package (Accelrys, San Diego).

  17. Dornan D, Wertz I, Shimizu H, Arnott D, Frantz GD, Dowd P, O'Rourke K, Koeppen H, Dixit VM. The ubiquitin ligase COP1 is a critical negative regulator of p53. Nature. 2004 May 6;429(6987):86-92. PMID: 15103385 . In the legend to Figure 5: new as of 2004 Jul 9. "Identification of a p53 consensus site on the COP1 promoter." This is another example of confusing a model (consensus sequence) with the reality (binding site).

  18. Hall DA, Zhu H, Zhu X, Royce T, Gerstein M, Snyder M. Regulation of gene expression by a metabolic enzyme. Science. 2004 Oct 15;306(5695):482-4. PMID: 15486299 . In the legend to Figure 3: new as of 2004 Oct 25. "Consensus motif derived from targets confirmed by PCR." No, the image is an (uncited!) sequence logo which is NOT a consensus sequence.

  19. Wei G, Li AG, Liu X. J Biol Chem. 2005 Jan 20. Insights into selective activation of p53 DNA binding by c-Abl. PMID: 15661746 . "As a transcription factor, p53 recognizes a specific consensus DNA sequence ..." This is the protein recognizing a model, and, of course, most p53 sites are not the consensus.

  20. Nucleic Acids Res. 2005 May 11;33(8):2726-33. 2005. Common patterns in type II restriction enzyme binding sites. Nikolajewa S, Beyer A, Friedel M, Hollunder J, Wilhelm T. PMID: 15888729 . Table 3 is full of consensus sequences. The natural variation that they suppressed ruins the point they want to make. new as of 2006 Jan 12.

color bar

If you would like to contribute a reference to this page, please follow these rules:

  1. Only scientific publications dated on or after 2001 May 1 will be accepted.
  2. Please send a snip of html for the entry in the format used on this page. The order is from Pubmed references: author, title, journal, date, volume (number):pages. The link is to Pubmed.
  3. Indicate whether you want your name and/or your email address listed. If you do not state this, I will list them.
Send them to Tom Schneider.

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab

origin: 2001 Apr 23
updated: 2012 Apr 10
color bar
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility