When a recognition site has two or more parts with various spacings between them, alignment by one part may blur out information in the other part. For example, if the four variants of this site:
occurred with equal frequency, then the positions marked by dots would have zero information content, even though these sequences would give a large information content if they were aligned with each other. To handle this one may align each part separately and add the information contents together. However, this leads to an overestimate of the information because the variable spacing is not taken into account. To take it into account, one may calculate how uncertain the spacing is from a tabulation of the frequency of each spacing and subtract this from the total information of the two parts. (This is equivalent to increasing the uncertainty of the site, Hs.) For the example above, Rsequence = 24 (ACGTACGTACGT) + 8 (GGCC) - 2 (spacing) = 30 bits. When this was done for ribosome binding sites, the total information content was not different from that given in Results (unpublished observation).
National Institutes of Health
National Cancer Institute
Policies | Viewing Files | Accessibility | FOIA