>
Next: The Nitty Gritty Bit Up: Sequence Logos: A Powerful, Previous: Information and Uncertainty

# The Magnificent Bit

The most common unit for measuring information and uncertainty is called the bit. A bit is the amount of information necessary to choose between two equally probable choices. To demonstrate this, let's play a game of twenty questions. Say that I put 1,048,576 identical boxes in a straight line and put a blue ball into one of them at random. I'll let you have twenty yes-or-no questions to find the ball, and if you do, I'll give you the ball (it's a special ball). Are you going to play? Of course you are, it's a really special ball. All of the boxes look identical, so what are you going to do? You can hope you get lucky, and guess randomly at twenty boxes, but then you'd only win that really special ball about once every 52,429 times you play. However, there is a better way! And best of all, you'll always win! Can you guess the best method? Well, here it is: the best questions to ask are the ones which eliminate the largest possible set of choices NO MATTER WHAT THE ANSWER IS. In other words, whether I say ``yes'' or ``no'', you can eliminate the same number of choices. That number is one-half of the total number of choices. Therefore, your first question would be ``Is the ball in the first half of the line of boxes?'' My answer would be ``Yes,'' and now you only have 524,288 boxes left to chose from, and you still have 19 questions. Although this seems like a lot of boxes to choose from, notice that 1,048,576 (the number you started with) is equal to 220. So if you keep dividing the set of boxes in half, your twentieth question will determine which of the two remaining boxes contains that really special blue ball. The answer to ``Where is the blue ball?'' is given by 20 bits of information.

This method of searching is by no means a new concept (computer programmers call it a binary search and have been using it for quite a long time), and it has been used in biology before, but only recently has someone used it for studying binding sites [8,9,10,11,,13]. It takes two bits of information to determine which of four equiprobable DNA bases occurs at a certain location. The first 1 bit decision divides the set in half, leaving only 2 choices. The second 1 bit decision determines which base is at the current location.

Next: The Nitty Gritty Bit Up: Sequence Logos: A Powerful, Previous: Information and Uncertainty
Tom Schneider
2003-02-12

U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  |
Policies  |  Viewing Files  |  Accessibility  |  FOIA