Suppose the symbols ``a,'' ``c,'' ``g,'' and ``t'' are the four
symbols a machine uses to
generate a twelve letter sequence ``gattttctcttt''.
So far, we know that
*N* = 12,
*M* = 4,
*N*_{a} = 1,
*N*_{c} = 2,
*N*_{g} = 1, and
*N*_{t} = 8.
We find that the frequencies are
,
,
,
and
.
Now, let's say that the frequencies are always the same no matter how many
sequences the machine creates.
In other words, if the set was infinite, then the frequency of each letter would equal
its probability, and this makes *P*_{i} = *F*_{i}.
So,
bits. Similarly,
*u*_{c} = 2.58,
*u*_{g} = 3.58, and
*u*_{t} = 0.58 bits.
Using equation (13), and substituting in the
values for
*P*_{a}, *P*_{c}, *P*_{g}, and *P*_{t}, we obtain the following:

so

U.S. Department of Health and Human Services | National Institutes of Health | National Cancer Institute | USA.gov |

Policies | Viewing Files | Accessibility | FOIA