We said that a sequence logo is a graphic representation of the amount of information to be found in a set of DNA, RNA, or protein sequences. But what is information? To understand that, we first must understand the technical meaning of uncertainty. Imagine yourself seated in front of a television screen. We are going to test your psychic powers, except that you haven't told us that you don't have any. Still, we will test your powers by flashing random symbols on the screen to see if you can guess them before we project them. Let's start with the alphabet. You are very uncertain as to which symbol will appear next. After all, there are twenty-six of them, and you aren't psychic. You are only able to guess correctly about once every twenty-six times a new symbol is flashed on the screen. Now, say we throw out all of the letters except ``a,'' ``c,'' ``g,'' and ``t''. We wouldn't tell you though, as far as you know, there are still twenty-six possibilities. But you're a smart person, so eventually you start to realize that only those four letters are being displayed, and your uncertainty decreases. This makes sense because now you are certain that it will be either an ``a,'' ``c,'' ``g,'' or ``t,'' and not one of the other twenty-two letters. Next, we bring in a new person, and we consistently flash the same letter on the screen, an ``a'' for instance. Their uncertainty would be zero; they'd know that the symbol was always going to be an ``a'', and if we started flashing other letters, like ``c,'' ``g,'' and ``t'' their uncertainty would suddenly increase. So, we see that whenever information is gained, for example when you determined that we stopped using all twenty-six letters, the level of uncertainty decreases. Likewise when information is lost, as in the case where the person no longer knew that the symbol would always be an ``a,'' uncertainty increases. So how do you measure information and uncertainty?
Viewing Files Accessibility