Does the information always increase in Ev runs?
Richard Wein asked:
"I would like to know roughly what proportion of the time your
program ... produces a high-information result.
I would guess it's every time, but I'd be grateful if you would confirm that please."
MATERIALS AND METHODS
The cleanest way to test this is to use
the standard example from the
According to my notes in
the ev program documentation 'see also' section,
this is the
"Parameter file for selective phase in the paper:
I want to look further out than 1000 generations, so I set
to run to 2000 generations where things should have stabilized.
Then I ran the evp program changing the random seed
from 0.00 to 0.99.
- mk script to run ev with varying seeds.
- evp.zzz parameter file that has 'zzz'
for the seed. The mk program loops through and replaces zzz
0.00 to 0.99.
parameter file for the plotting program
parameter file for the histogram plotting program
parameter file for the histogram to postscript
- list The last run with
- xyin raw data file for xyplo.
This is the the last line of the list file that ev produces
for each of the 100 runs,
except that it has the random seed as the first column.
This is a graph of the information content,
Rsequence, as a function of the seed used
to initiate the evolutionary run:
The graph is also available in PostScript:
Clearly these 100 runs all give values
close to the precdicted 4 bits.
We would expect the results to fluctuate around the predicted
value, perhaps close to a Gaussian distribution:
The results are indeed roughly Gaussian.
Notice that they tend to be slighly lower overall than
the predicted Rf = 4 bits.
The reason for this is unknown.
The histogram is also available in PostScript:
Now we need to ask a critical question:
Are there duplicate runs?
That is, the random number generator is not truely
random since it uses a chaotic numerical method
to produce the numbers. So it may be that some of the runs
are identical. How many are?
Only 7 of the runs gave the same final Rsequence.
The ev program was run repeatedly to 2000 generations
starting with 100 different random seeds.
The lowest observed final information content was
2.3 bits and the highest
was 5.2 bits, with a mean of 3.8 +/- 0.5 bits.
Duplicate runs occured 7% of the time.
These duplicates do not affect any conclusions,
but they do suggest that the random number generator
is not the best.
the program invariably gave a significant information increase.
From the observed values, we can determine
that the probability of a return to zero information is
1.5 x 10-14
(7.6 standard deviations).
2003 July 7
Are the Logos Different in Different Runs?
Pim van Meurs asked:
... even with the same
initial conditions for the genome, the final evolved binding
sites/recognizers will vary?
Yes, the sites and their recognizer are different.
For every point in the graph above, I
captured the corresponding logos
origin: 2002 February 22
updated: 2003 July 7
U.S. Department of Health and Human Services
National Institutes of Health
National Cancer Institute