Delila Program: diffribl

diffribl program

Documentation for the diffribl program is below, with links to related programs in the "see also" section.

{   version = 1.24; (* of diffribl.p 2005 Jun 3}

(* begin module describe.diffribl *)
(*
name
diffribl: calculate the difference between two ribls

synopsis
diffribl(ribla: in, riblb: in, diffriblp: in,
posdiff: out, drxyin: out, output: out)

files

ribla:  The ribl output of the Ri program for the first of 2
compared ribls

riblb:  The ribl output of the Ri program for the second of 2
compared ribls

posdiff: An output file which can be used with the xyplo program. This
file lists all the difference or distance values for each
position. The columns are as follows:

1. This coordinate is the relative position of ribl A.

2. This value is the sum of differences or distances at
that position

This file is only useful for the non-scrolling calculations.

diffriblp:  parameters to control the program.  The file must contain the
following parameters, one per line:

parameterversion: The version number of the program.  This allows the
user to be warned if an old parameter file is used.

range of calculation (char OR char with integers):
This allows for the user to specify the range of the matrix.  The
user can use the range of the matrix by using an 'r'.  The user can
specify their own range by using a 'u' and then the desired range.
The user range must be smaller than the range of the ribl.

scrolling function (char OR char with integers):
The matrix can be scrolled over itself by using 'v' and the range of
the scroll.  To not use the scrolling function, use 'n'.

calctype: type of calculation (char):
The user can use one of several types of calculations.
In many cases, the units reported are in bits.

(e) The first, specified by "e" is a measurement of the Euclidean
distance between two positions in two matrices.  This is done
with the following equation:

Positional distance = square root( (A1 - A2)^2 + (C1 - C2)^2 +
(G1 - G2)^2 + (T1 - T2)^2 )

This positional distance is then summed for all positions giving
the total sum of positional distances.

When "e" is used with the scrolling function, it calculates only
for the overlapping part of the matrices.  This feature can be
used with both symmetric and asymmetric models.

(o) The second, specified by "o", is a measurement of the Euclidean
distance between two matrices.   As opposed to the calculation
done in "e", this treats each matrix as a point in 4^(l)
dimensional space.

Since there is only one point, in this case there are no positional
differences and so the values in posdiff are reported the
same as for the "e" option.

(d) The third, specified by "d", is a measurement of difference
between the two matrices.  This is done with the following
equation:

Positional difference = (A1 - A2) + (C1 - C2) +
(G1 - G2) + (T1 - T2)

This positional difference is then summed for all positions
giving the total sum of positional differences.

When "d" is used with the scrolling function, it calculates only
for the overlapping part of the matrices.  This feature cannot be
used with an asymmetric model.

(s) The fourth, specified by "s", is a measurement of the
average response the ribla should make to passing across
sites in riblb.  This is computed as:
$\sum_l \sum_b f_b(b,l-offset)*Ri_a(b,l)$

(This is LaTeX typesetting notation, \sum is sum;
"_" means subscript.)

NOTE: the frequency f is computed from the number of bases
at the given position.

(z) The fifth, specified by "z", is a measurement of the three
dimentional distance between the two matrices,
following Zhang.Zhang1991a and Zhang.Zhang1991b.

Base frequencies are computed from the ribl data file.
Then each set of frequencies, a, c, g, and t, for which,
by definition,

a + c + g + t = 1                                      (1)

can be represented in three dimensions as:

x = (a+g) - (t+c) = 2(a+g) - 1                         (2)
y = (a+c) - (t+g) = 2(a+c) - 1
z = (a+t) - (g+c) = 2(a+t) - 1

These are three independent variables defined by Zhang.
They map into a tetrahedron in three dimensions.

Zhang consideres the above to be a 'reduced'
coordinate system.  The non-reduced system is:

X = [sqrt(3)/4] x                                      (3)
Y = [sqrt(3)/4] y
Z = [sqrt(3)/4] z

The positional distance is calculated as:

positional distance = sqrt (   (X2-X1)^2               (4)
+ (Y2-Y1)^2
+ (Z2-z1)^2 )

where sqrt is the square root.

From Zhang.Zhang1991a (page 46 and 47), this simplifies
to:

positional distance = [sqrt(3)/2]                      (5)

* sqrt( (a2-a1)^2 + (g2-g1)^2 +
(c2-c1)^2 + (t2-t1)^2  )

where the (a1, g1, c1, t1), (a2, g2, c2 and t2) are
probabilities of two different matrices at one position.

This positional distance is then summed for all positions giving
the total sum of positional distances.

Because frequencies sum to 1, there really are only three
independent degrees of freedom and therefore only three
dimensions.  So equations for three and four dimensions
give the same results.

(y) The seventh, specified by "y", is computed as "z" and then
it is normalized by the maximum possible distance between
points in the tetrahedrons.

With frequencies, the largest distance in the Zhang
tetrahedron is sqrt(3/2), along the edge.  For L
positions, the largest possible distance is therefore
sqrt(3/2)L.  All values are divided by this maximum.

The following shows that the maximum distance in the
non-reduced coordinate system is sqrt(3/2).  Using
equations (2) and (3), for the case of all G, the point is
at

(X, Y, Z) = (sqrt(3)/4, -sqrt(3)/4, -sqrt(3)/4)

while for all C, the point is

(X, Y, Z) = (-sqrt(3)/4, sqrt(3)/4, -sqrt(3)/4)

The distance between these points is sqrt(3/2).

drxyin:  This gives the total sum of distance or difference, depending on
which calculation function is being used.  When the scrolling
function is being used, it will report the total sum value along
with the position of the scroll.  The position of the scroll is
the distance between the zero coordinates of the two matrices.

output: messages to the user

description

This program looks at the differences in two individual information
weight matricies (ribls) by finding the difference in information
at each position, for each base, and then summing the differences.
Then all of the differences at each position are summed to express
a diffribl value.

Actually, the program now has a number of other ways of comparing
the ribls, depending on a user parameter.

examples

examples of diffriblp

1.24        version of diffribl that this parameter file is designed for.
r u -10 +10 r use the from/to coords from ribl, u means use user specified
n v -21 +21 v and coords=move riblB across riblA for the range, n=no
eodszy      e:Euclid, o:Euclid4d, d:difference, s:scan, z:Zhang, y:znorm

documentation

@article{Shultzaberger.Schneider2001,
author = "R. K. Shultzaberger
and R. E. Bucheimer
and K. E. Rudd
and T. D. Schneider",
title = "{Anatomy of \emph{Escherichia coli}
Ribosome Binding Sites}",
journal = "J. Mol. Biol.",
volume = "313",
pages = "215-228",
comment = "Shultzaberger.Schneider.flexrbs",
{http://www.lecb.ncifcrf.gov/\~{}toms/paper/flexrbs/}
{http://www.lecb.ncifcrf.gov/\~{}toms/paper/flexrbs/}",
year = "2001"}

example parameter file: diffriblp

Description of use is in Shultzaberger.Schneider2001:
http://www.lecb.ncifcrf.gov/~toms/paper/flexrbs/

program that generates ribls:  ri.p
program that uses ribls to find sites: scan.p
graphics program for xyin: xyplo.p
source of program modules: lister.p

author

Ryan Shultzaberger
Thomas D. Schneider
Zehua Chen

bugs

There is a problem with comparing different sized ribls.  I (Ryan?)
need to fix this.  For now, only use this program with same sized
ribls.  The result will be wrong if done otherwise.

Comparisons in 4 dimensional space are not appropriate because the
4 probabilities are not independent.  To avoid this, one can
replace the 4 dimensional space with a 3 dimensional one according
to Zhang's methods:

@article{Zhang.Zhang1991a,
author = "C.-T. Zhang
and R. Zhang",
title = "Diagrammatic representation of the distribution of {DNA}
bases and its applications",
journal = "Int. J. Biol. Macromol.",
volume = "13",
pages = "45-49",
note = "tetrahedron method",
year = "1991"}

@article{Zhang.Zhang1991b,
author = "C.-T. Zhang
and R. Zhang",
title = "Analysis of distribution of bases in the coding sequences
by a diagrammatic technique",
journal = "Nucleic Acids Res.",
volume = "19",
pages = "6313-6317",
note = "tetrahedron method",
year = "1991"}

@article{Zhang1997,
author = "C.-T. Zhang",
title = "A Symmetrical Theory of {DNA} Sequences and Its Applications",
journal = "J. Theor. Biol.",
volume = "187",
pages = "297-306",
year = "1997"}

To do this, the Ri can be converted to probabilities according to
Ri = 2 + log2(Pi).  Then the probabilities are converted to the
Zhang XYZ space.  Distances are then measured in that XYZ space.
However it is better to use the Pi directly from the ribl file.

technical notes

*)
(* end module describe.diffribl *)
{This manual page was created by makman 1.44}