Delila Program: dirty

dirty program

Documentation for the dirty program is below, with links to related programs in the "see also" section.

{version = 2.39; (* of dirty, 1994 sep 5}

(* begin module describe.dirty *)
(*
name
   dirty: calculate probabilities for dirty DNA synthesis

synopsis
   dirty(dirtyp: in, distribition: out, xyin: out, output: out)

files
   dirtyp: parameter file.
      one line giving the number of random bases that will be used (r).
      one line giving the average number of changes desired (n)
   distribution: the distribution of numbers of changes at the
      peak for n
   xyin: Graphics output of the program.  The input to xyplo for plotting.
      The graph gives three curves against the independent variable p,
      which is the probability of getting the correct base and randoms
      is the number of random bases:

      o = probability of only one base changed, as randoms (1-p)p^(randoms-1)
      m = probability of one or more bases changed: 1 - p^randoms
      n = probability of n bases changed

      I have not found this output to be too useful; I concentrate
      on the distribution file.
   output: messages to the user

description
   If one is designing a randomized ("dirty") DNA synthesis, how
   heavily should it be randomized?  To use this program, pick the size
   of the region you want to randomize, r.  Then make a guess at the
   average number of changes you want over the region, n.  Put r and n
   into dirtyp and run the program.  Look at the distribution file.
   the line for n=0 is the frequency that you will get back the
   original sequence.  You must chose whether this is tolerable.  For
   example, when I synthesized the T7 promoters, I knew that I could
   find at least 1 promoter in 100 clones by toothpicking, and I was
   willing to toothpick thousands.  This way I was sure to get some
   positives, even if they were the original sequence.  (As it turned
   out, the frequency of functional promoters was much higher than
   1%.)  If you have a strong selection, you could make this a small
   number, by increasing the number of changes per clone.  With more
   changes per clone you will get more data from the randomization, so
   make it as high as you can tolerate.

   The program calculates the ratio of bases to random bases.  In the
   experiment described in the NAR paper, the technician put 4 drops of
   the appropriate base with 1 drop of the equiprobable mix.  This made
   the dirty bottle.

example

This is the analysis used in the NAR paper.  With the dirtyp file containing:

27    the number of random bases that will be used.
4     the number of changes desired (n)

the distribution file is:
* dirty 2.38
* distribution of number of changes calculated from binomial
* 27 random positions
*  4 average number of bases changed
* p = probability of correct base = 0.85185185
* fraction of [base] : 0.80246914
* fraction of [random n] : 0.19753086
*
* ratio of [base] to [random N]: 4.06250000
*
* TO DO THE SYNTHESIS,
* add one part of an equimolar mixture of the 4 bases
* to 4.06250000 parts of the "wild type" base
*
* In the following table,
*    n = number of changes
*    f = frequency of n changes
*    s = running sum of frequencies f (should approach 1.0)
* In the first row, where n=0, f is the frequency of wild type sequences
*
n = 0   f = 0.01317741   s = 0.01317741
n = 1   f = 0.06187652   s = 0.07505392
n = 2   f = 0.13989473   s = 0.21494866
n = 3   f = 0.20274599   s = 0.41769465
n = 4   f = 0.21156103   s = 0.62925568
n = 5   f = 0.16924883   s = 0.79850451
n = 6   f = 0.10792679   s = 0.90643130
n = 7   f = 0.05630963   s = 0.96274093
n = 8   f = 0.02448245   s = 0.98722338
n = 9   f = 0.00898872   s = 0.99621210
n =10   f = 0.00281386   s = 0.99902596
n =11   f = 0.00075629   s = 0.99978226
n =12   f = 0.00017537   s = 0.99995763
n =13   f = 0.00003519   s = 0.99999282
n =14   f = 0.00000612   s = 0.99999894
n =15   f = 0.00000092   s = 0.99999986
n =16   f = 0.00000012   s = 0.99999999
n =17   f = 0.00000001   s = 1.00000000
n =18   f = 0.00000000   s = 1.00000000
n =19   f = 0.00000000   s = 1.00000000
n =20   f = 0.00000000   s = 1.00000000
n =21   f = 0.00000000   s = 1.00000000
n =22   f = 0.00000000   s = 1.00000000
n =23   f = 0.00000000   s = 1.00000000
n =24   f = 0.00000000   s = 1.00000000
n =25   f = 0.00000000   s = 1.00000000
n =26   f = 0.00000000   s = 1.00000000
n =27   f = 0.00000000   s = 1.00000000

see also
   xyplo.p

documentation

   @articleSchneider1989,
   author = "T. D. Schneider and G. D. Stormo",
   title = "Excess Information at Bacteriophage T7 Genomic Promoters
            Detected by a Random Cloning Technique",
   year = "1989",
   journal = "Nucleic Acids Research",
   volume = "17",
   pages = "659-674"

author
  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland
  toms@ncifcrf.gov

bugs
   n must be an integer

*)
(* end module describe.dirty *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility