What is a Sequence Logo?

The object of a sequence logo (Fig. 1) [1]

Figure 1: Some aligned sequences and their sequence logo.
At the top of the figure are listed the 12 DNA sequences from the P $H_g = 1.9995 \pm 0.0058$and P $4.0 \pm 0.4$control regions in bacteriophage lambda. These are bound by both the cI and cro proteins [2]. Each even numbered sequence is the complement of the preceding odd numbered sequence. The sequence logo, described in detail in the text, is at the bottom of the figure. The cosine wave is positioned to indicate that a minor groove faces the center of each symmetrical protein. Data which support this assignment are given in reference [3].

is to visualize the information contained in a set of DNA, RNA, or protein sequences by examining the order and frequency of the chemical subunits which make up the sequences. The name ``sequence logo'' comes from the fact that a set of sequences is being represented as a single graphic which contains one or more separate elements (that's the definition of the word ``logo'' [4]). For example, when an economist wants to show a trend in a market, he creates a graph of the conditions of the market to make the trend apparent with just a quick glance. The sequence logo functions in a similar manner by graphically representing the conservation (``information content'') of a set of sequences in a clear, concise and mathematically sound manner. Sequence logos are generated by programs which look at the sequences and analyze them using the information theory developed by Claude Shannon [5,6,7]. The process of generating a sequence logo is somewhat similar to that of creating a consensus sequence, but unlike a consensus, subtle features of the data are retained. To understand sequence logos one must first understand the concepts of information and uncertainty.

