a pair of glasses with eyes in them

How To Read Sequence Logos

One can often read the details of a DNA/protein interaction directly from a sequence logo. This is described in a paper:

@article{Schneider.oxyr,
author = "T. D. Schneider",
title = "Reading of {DNA} Sequence Logos:
Prediction of Major Groove Binding
by Information Theory",
journal = "Meth. Enzym.",
volume = "274",
pages = "445-455",
note = "available on the World Wide Web from
http://schneider.ncifcrf.gov/paper/oxyr/",
year = "1996"}

This page presents additional figures to accompany the paper.

The 4 basepairs of DNA.

If you are not familiar with information theory, here is a one minute lesson! When the frequencies of bases are not simple numbers like 100% or 50%, a more complicated formula is needed. The formula gives intermediate values (still in bits). To learn about it see the primer introducing information theory.

You can also look up terms in the glossary:

You can use the image of Numbered DNA Base Pairs (click on icon to the right) as a reference while reading sequence logos.

color bar

This figure (generated by the alist and makelogo programs) shows the sequence logo for the OxyR protein from E. coli, generated from the given sequences. Note that by giving a GenBank accession number, a coordinate and an orientation, the location of the central base of the binding site (the "zero coordinate") is defined precisely. (Note: To be sure that the locations can be found again in GenBank, you must use the ACCESSION version such as "J04553.1".)

The main (testable!!) assumption is that the protein binds to B-form DNA. The peak of the sine wave represents the major groove of B-form DNA facing the protein while the trough represents the minor groove facing the protein. Positions that have more than 1 bit of information are likely to be major groove contacts.

Reading a sequence logo: major groove when greater than
1 bit

color bar

Reading a sequence logo: A or T is N2

color bar

Reading a sequence logo: G or A is N7

color bar

Reading a sequence logo: G or T is O4 N6 or O6 N4

color bar

More examples can be found in these E. coli sequence logos (postscript). This is figure 6 of the paper Information Analysis of Sequences that Bind the Replication Initiator RepA Papp, P. P., D. K. Chattoraj, and T. D. Schneider. 1993. J. Mol. Biol. 233: 219-230. Note how the logos follow the cosine wave. An explanation for this pretty effect is given in the oxyr paper.
A gallery of 8 sequence logos

Cover of Nucleic Acids Research, December 2001
2001 December 1 issue of
Nucleic Acids Research
Copyright ® 2001
Oxford University Press.

This observation led to the discovery of base flipping in replication systems, see:

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers


Schneider Lab

origin 1996 September 17
updated 2011 May 04

color bar
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility