========================================================================
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
MessageID: <D2zLuH.8n8@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
NntpPostingHost: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3g37hp$e7b@newsbf02.news.aol.com>
Date: Thu, 26 Jan 1995 00:39:52 GMT
Lines: 238
In article <3g37hp$e7b@newsbf02.news.aol.com> hpyockey@aol.com (HPYockey)
writes:
 Review of book reviews of Information Theory and Molecular Biology by
 Hubert P. Yockey published by Cambridge University Press 1992

 Why is information theory and coding theory important in molecular
 biology? Gregor Mendel proved that inheritance is particulate and does not
 blend. Morgan showed that inheritance is linear. Watson and Crick
 demonstrated that inheritance is digital. Thus the genome is a message
 recorded in digitial fashion in sequences of nucleotides in DNA. While
 Nature has been recording the life message digitally and using a code for
 at least 3.8 billion years, it is only recently that modern communication
 engineers have discovered the benefits of recording and transmitting
 messages in digitized form. The genetic message is isomorphic with
 messages in general and the genetic code is isomorphic with all codes.
 This means that information theory and coding theory are essential to
 molecular biology.
Nicely put!
 I asked Professor Haken for a list of his publications. ...
 My English translation: "The Shannon information concept says nothing
 about whether the message is meaningful or meaningless, valuable or
 valueless, that is, it goes in every sense of the words, or in other words
 semantics is lacking. In the field of biology this fault means a
 substantial deficiency."
enormous progress understanding DNA/RNA binding sites and molecular machines in
general without needing to use the terms "meaning" or "value".
 My 'polemic against the seminal work of Manfred Eigen' exposes his
 confusion of philosophical notions of semantics and information measured
 in bits as well as a number of other basic faults. Eigen feels free to
 introduce conjectures cooked up ad hoc to suit each problem. One can solve
 (sic) any problem with enough ad hoc conjectures. To remedy what he sees
 as an inadequacy in "classical information theory" he calls for a purely
 empirical "value parameter" that is characteristic of "valued
 information". He states that this "valued information" is reflected by
 increased "order". On the contrary, it is well known in information theory
 that 'increased order' decreases the information content of a message.
Here we differ significantly. This is going to get you into lots of trouble at
some point Hubert! You haven't followed our discussions on the point, so I'll
recap here briefly. If it doesn't make sense, just ask questions (on the
net). I take information to be the decrease of uncertainty in a physical
system. Uncertainty is a state function defined:
H = \sum_i Pi log_2 Pi (bits per state)
so information is:
R = Hbefore  Hafter.
Subtracting Hafter corresponds to Shannon's subtraction of the uncertainty due
to noise. In human communications, such as in a good phone connection, Hafter
is often so small that we tend to neglect it. That is, Hafter = 0. But then
we have a SPECIAL CASE:
R = Hbefore (only when Hafter is zero!)
The caveat is often ignored. The consequence is confusion. Consider that H
corresponds to the entropy S under the special condition that the probabilities
Pi refer to the states of a molecular machine. That is,
S = k \sum_i Pi ln Pi (joules per degree kelvin)
So when working with molecules,
S = k ln(2) H.
That is, uncertainty corresponds to entropy, an idea that has been around for a
while. If one calls H the entropy then one is using mixed terminology  note
the different units!
Here's the fast track to confusion:
If H corresponds to S and Hbefore corresponds to R, then R corresponds to S.
But wait! The second law of thermo says that closed systems tend to have
increasing disorder, S rises. So information R corresponds to disorder S.
Lower information corresponds to order. Or in your words:
 ... 'increased order' decreases the information content of a message.
It is completely confusing to say that the more information there is in a
newspaper the less order there is in it! Clearly a newspaper carries a large
amount of information (in bits) to a reader and has low disorder. Burning the
newspaper destroys the ability of the reader to lower their own uncertainty.
All this confusion can be avoided by always treating information as a measure
of a state change between two states. Then it becomes clear that 'increased
order' corresponds to increased information and that increased order is a
decrease in entropy.
If you do not follow this course, then you will be completely confused by the
case of binding sites in DNA. (I was very confused for about 6 months when I
started using information theory to work on binding sites because of the
question of how to do the calculation and the confounding problem of small
sample size. But if we work with large sample sized that question can be set
aside.)
 Anyone who is computer literate knows that, in the context of computer
 technology, the word information does not mean knowledge. Along with many
 other authors, Eigen makes a play on words by using information in the
 sense of knowledge, meaning and specificity. For example, in
 Naturwissenschaften (1971) volume 58 465523 (in English) he states with
 reference to sequences in DNA that: "Such sequences cannot yet contain
 any appreciable amount of information." He means knowledge or specificy.
 Eigen uses the word 'information' in two different senses in one
 sentence: "Information theory as we understand it today is more a
 communication theory. It deals with problems of processing information
 rather than of "generating" information."
Yes, that is the problem with his work, you have stated it precisely.
 On the other hand, Elitzur, who is a professor in the Department of
 Chemical Physics at The Weizmann Institute, in Rehovot Israel is alarmed
 by my remark (page 313) x "xit is easy to see that thermodynamics has
 nothing to do with Darwin's theory of evolution. Upon reading this
 uncompromising statement the bewildered reader may recall several
 discussions he or she has previously read concerning the apparent conflict
 between the Second Law and biological ordergrowth." Elitzur calls the
 Second Law of Thermodynamics an explanation of evolution.
I agree here with Elitzur. The basis is the confusion that information
is the same as entropy. They are not.
 The context in which this remark was made is in Section 12.1 where I
 discussed the assertion made by creationists today and by critics of
 Darwin in the nineteenth century, that there is a conflict between
 evolution and the second law. Creationists say that, since the second law
 cannot be challenged, Darwin's theory of evolution must be abandoned in
 favor of special creation. Even a scientist as eminent as Eddington
 believed there was such a conflict. Had Elitzur overcome his bewilderment
 and read the next paragraph on page 313 he would have found the
 explanation: "In fact, evolution requires an increase in
 KolmogorovChaitin algorithmic entropy of the genome in order to generate
 the complexity necessary for higher organisms".
Shouldn't that be the KolmogorovChaitin algorithmic information?
 Elitzur is dismayed by my finding (Yockey, 1974, 1977) that there is no
 relation between Shannon entropy in information theory and
 MaxwellBoltzmannGibbs entropy in statistical mechanics.
And so am I!
 Let us see why
 that is so. According to modern theory of probability one cannot speak of
 a probability without first establishing a probability space and setting
 up a probability distribution of the random variables, appropriate to the
 problem. The axioms of probability theory must be satisfied in order to
 avoid a "Dutch book" and to be sure that we are not using knowledge we do
 not have (Yockey, 1992, p2033)

 The probability space in statistical mechanics, called phase space by
 theoretical physicists, is six dimensional and is defined by the position
 and momentum vectors of the ensemble of particles. The values of these
 position and momentum vectors are random variables and the pi form
 probability vectors referring to particle i. The function for entropy in
 statistical mechanics, S, has the dimensions of the Boltzmann constant k
 and has to do with energy, not information. Shannon entropy has no thermal
 or mechanical dimensions.
No. Shannon's work was based rather solidly on thermal and mechanical
considerations. Look at his neglected but beautiful 1949 paper. He uses
Nyquist's equation (which is N = WKT, note the K!!) to apply the temperature to
the electrical circuit. He recognized that, although (as you say) large parts
of the theory are without connection to the physical world, the communications
systems that one builds are in the physical world and these must deal with
thermal noise. Indeed, if there were no thermal noise, the channel capacity
would go to infinity and there would be no problem to communicate perfectly
over any channel.
 Information theory is concerned with messages expressed in sequences of
 letters selected from a finite alphabet by a Markov process. The
 probability space is defined by the letters of the alphabet under
 consideration, which are random variables and the pi form probability
 vectors accordingly.
That is only a portion of the theory.
 To illustrate this point further, one may consider the probability space
 of a dice game that consists of the numbers 2 through 12 and calculate the
 corresponding entropy (Yockey, see exercise on page 88). Clearly, the
 entropy of a dice game has nothing to do with statistical mechanics and
 thermodynamics.
No. If we always consider differences between states of systems, then the
microstates drop out. Rather than repeat it here, see the discussion in
Schneider.edmm.
 It may have something to do with information theory since
 a sequence of letters selected from the alphabet is generated as a Markov
 process by a series of tosses of the dice. Such a sequence of letters
 forms a message in which some gamblers find meaning. For these reasons
 entropy in statistical mechanics and entropy in information theory are
 different concepts that have no relationship that enables us to make an
 equivalence of one to the other.
See above.
 Elitzur gets off a number of bloopers: "Information theory, according to
 Yockey, 'shows that it is fundamentally undecidable whether a given
 sequence has been generated by a stochastic process or by a highly
 organized process' (p82). This must be an amazing statement for anyone
 familiar with the basic concepts of information theory where information
 is defined as the very opposite of randomness."

 'Information' is, of course, not the very opposite of randomness. Elitzur
 is using the word 'information' in the semantic sense as synonym for
 knowledge or meaning. Everyone knows that a random sequence, that is, one
 chosen without intersymbol restrictions or influence, carries the most
 information in the sense use by Shannon and in computer technology. Note:
UNCERTAINTY
 For a brief explanation of randomness, complexity, order and information
 see Yockey Nature 344 p823 (1990).
Here you have made the mistake of setting Hafter to zero. So a random sequence
going into a receiver does not decrease the uncertainty of the receiver and so
no information is received. But a message does allow for the decrease. Even
the same signal can be information to one receiver and noise to another,
depending on the receiver!
 Give me your opinion but after you read the book!
I will write more at that point ...
Tom Schneider
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland 217021201
==================================================================
========================
Brian Harper 
Associate Professor  "It is not certain that all is uncertain,
Applied Mechanics  to the glory of skepticism"  Pascal
Ohio State University 
========================