# Does the information always increase in Ev runs?

INTRODUCTION

"I would like to know roughly what proportion of the time your program ... produces a high-information result. I would guess it's every time, but I'd be grateful if you would confirm that please."

MATERIALS AND METHODS

The cleanest way to test this is to use the standard example from the ev paper. According to my notes in the ev program documentation 'see also' section, this is the "Parameter file for selective phase in the paper: evp.selection".

I want to look further out than 1000 generations, so I set the parameters to run to 2000 generations where things should have stabilized. Then I ran the evp program changing the random seed from 0.00 to 0.99.

• mk script to run ev with varying seeds.
• evp.zzz parameter file that has 'zzz' for the seed. The mk program loops through and replaces zzz 0.00 to 0.99.
• xyplop control parameter file for the plotting program xyplo.
• genhisp control parameter file for the histogram plotting program genhis.
• genpicp control parameter file for the histogram to postscript conversion program genpic.

RESULTS

• list The last run with seed 0.99.
• xyin raw data file for xyplo. This is the the last line of the list file that ev produces for each of the 100 runs, except that it has the random seed as the first column.
• This is a graph of the information content, Rsequence, as a function of the seed used to initiate the evolutionary run:

The graph is also available in PostScript: xyout.ps. Clearly these 100 runs all give values close to the precdicted 4 bits.
• We would expect the results to fluctuate around the predicted value, perhaps close to a Gaussian distribution:

The results are indeed roughly Gaussian. Notice that they tend to be slighly lower overall than the predicted Rf = 4 bits. The reason for this is unknown. The histogram is also available in PostScript: histogram.ps
• Now we need to ask a critical question: Are there duplicate runs? That is, the random number generator is not truely random since it uses a chaotic numerical method to produce the numbers. So it may be that some of the runs are identical. How many are? Only 7 of the runs gave the same final Rsequence.

# CONCLUSION

The ev program was run repeatedly to 2000 generations starting with 100 different random seeds. The lowest observed final information content was 2.3 bits and the highest was 5.2 bits, with a mean of 3.8 +/- 0.5 bits. Duplicate runs occured 7% of the time. These duplicates do not affect any conclusions, but they do suggest that the random number generator is not the best. Despite this, the program invariably gave a significant information increase. From the observed values, we can determine that the probability of a return to zero information is 1.5 x 10-14 (7.6 standard deviations).

2003 July 7

# Are the Logos Different in Different Runs?

... even with the same initial conditions for the genome, the final evolved binding sites/recognizers will vary?
Yes, the sites and their recognizer are different. For every point in the graph above, I captured the corresponding logos

Schneider Lab

origin: 2002 February 22
updated: 2003 July 7

U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  |
Policies  |  Viewing Files  |  Accessibility  |  FOIA