>
Next: (c) Use of the
Up: APPENDIX
Previous: (a) Exact method
The second method to calculate the sampling
error correction is from Miller (1955)
and Basharin (1959) who derived an approximation for the expectation of a
sampled uncertainty,
AE(H_{nb}), that is good for large n:

(16) 
where s, the number of symbols, is 4 for mononucleotides.
Fig. 4
shows E(H_{nb}) and
AE(H_{nb}) for several values of n.
This table^{9}
helps one to choose
between
AE(H_{nb})(a computationally cheap estimate that is inaccurate for small
n but accurate for large n) and E(H_{nb}) (an exact calculation that is
computationally costly for large n).
We use
AE(H_{nb}) above n=50 because the
cumulative difference between E(H_{nb}) and
AE(H_{nb})in a site 100 positions wide would be at most 0.078 bits.
The exact E(H_{nb}) is used for n less than or equal to 50
since its computation
is rapid in this range.
Figure 4:
Statistics of H_{nb} for equiprobable genomic composition.

Next: (c) Use of the
Up: APPENDIX
Previous: (a) Exact method
Tom Schneider
20021016
U.S. Department of Health and Human Services

National Institutes of Health

National Cancer Institute

USA.gov

Policies

Viewing Files

Accessibility

FOIA