> next up previous
Next: From Uncertainty to Information Up: Sequence Logos: A Powerful, Previous: The Nitty Gritty Bit

Example

Suppose the symbols ``a,'' ``c,'' ``g,'' and ``t'' are the four symbols a machine uses to generate a twelve letter sequence ``gattttctcttt''. So far, we know that N = 12, M = 4, Na = 1, Nc = 2, Ng = 1, and Nt = 8. We find that the frequencies are $F_a = \frac{1}{12}$, $F_c = \frac{2}{12}$, $F_g = \frac{1}{12}$, and $F_t = \frac{8}{12}$. Now, let's say that the frequencies are always the same no matter how many sequences the machine creates. In other words, if the set was infinite, then the frequency of each letter would equal its probability, and this makes Pi = Fi. So, $u_a = -\log_2(0.08) = 3.58$ bits. Similarly, uc = 2.58, ug = 3.58, and ut = 0.58 bits. Using equation (13), and substituting in the values for Pa, Pc, Pg, and Pt, we obtain the following:

 
H = -[   $\displaystyle \frac{1}{12} \times \log_2(\frac{1}{12})
 \; + \; \frac{2}{12} \times \log_2(\frac{2}{12})$  
  + $\displaystyle \frac{1}{12} \times \log_2(\frac{1}{12})
 \; + \; \frac{8}{12} \times \log_2(\frac{8}{12}) ]$  
$\displaystyle \;$     (14)

so H = 1.42 bits. This follows with our earlier discussion, as we find that it requires an average of 1.42 bits to determine which symbol is located at any particular position in the sequence. (Note - For equiprobable choices, equation (2) can be used and is much simpler, we just wanted to show you how the more general formula works. As an exercise, you can show that if all the Pi are equal, equation (13) reduces to the same form as (2).)


next up previous
Next: From Uncertainty to Information Up: Sequence Logos: A Powerful, Previous: The Nitty Gritty Bit
Tom Schneider
2003-02-12

U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA