Delila Program: palinf

# palinf program

## Pascal source code: palinf.p (wget instructions) Instructions on compiling Alphabetic List of Delila Programs Delila Programs by Most Recent Update Please report broken links Copyright Statement for Delila Programs

### Documentation for the palinf program is below, with links to related programs in the "see also" section.

```{   version = 2.37; (* of palinf.p 2013 Jul 25}

(* begin module describe.palinf *)
(*
name
palinf: find palindromes, based on information theory

synopsis
palinf(book: in, palinfp: in,
fout: out, palinfeatures: out, output: out)

files

book: a book from the Delila system

palinfp: parameters to control palinf, one per line

1. The minimum rsequence of the palindrome to detect.
alternatively, if the number is negative, it is the
desired significance of the detected peaks, given in
standard deviations.

2. (Optional) size (integer).  The largest size palindrome allowed;
base pairs across both halves of the site.  if omitted, the
entire sequence is used (which may be very expensive).
if this number is even, the next higher odd number will be used.

3. (Optional) If the first character of this line is an 'm' then
palinf will plot palindrome size (m) versus information content
(rsequence).  A sharply rising curve indicates a good palindrome.
'x' means plot position (x) versus information content (rsequence).
a different character, such as 'n', means to list
the detected palindromes.

fout: Locations of palindromes.

In the m mode, the coordinate location of significant palindromes
(ie ones that passed the criterian) is given followed by a graph
that shows the structure and significance of the palindrome from
center to the outside:

at position   725
1                   2                   3
m even  odd<0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0
1 -0.5 -0.5=. 1 2 3 4 5         .         .         .         .         .
2  1.0  1.0 . = 2 3 4 5         .         .         .         .         .
3  2.5  0.5 .o 1 e2  3. 4  5    .         .         .         .         .
4  4.0  0.0 o  1  2 e3. 4  5    .         .         .         .         .
5  5.5 -0.5o.   1   2 .e3   4   5         .         .         .         .
6  7.0 -1.0o.   1   2 . 3 e 4   5         .         .         .         .
7  8.5  0.5 .o   1    2    3 e  4    5    .         .         .         .
8  8.0  2.0 .   o1    2    3e   4    5    .         .         .         .
9  7.5  3.5 .    1 o  2    e    4    5    .         .         .         .
10  7.0  3.0 .    1o   2   e3    4    5    .         .         .         .
11  6.5  4.5 .     1  o. 2e    3 .   4     5         .         .         .
12  6.0  6.0 .     1   . =     3 .   4     5         .         .         .
13  7.5  7.5 .     1   . 2  =  3 .   4     5         .         .         .
14  7.0  9.0 .     1   . 2 e   o .   4     5         .         .         .
15  8.5 10.5 .      1  .   2  e  .o      4 .    5    .         .         .
16  8.0 12.0 .      1  .   2 e   .3  o   4 .    5    .         .         .
17  9.5 13.5 .      1  .   2    e.3     o4 .    5    .         .         .
18  9.0 15.0 .      1  .   2   e .3      4 o    5    .         .         .
19  8.5 16.5 .       1 .     2e  .   3     . 4o      5         .         .
20  8.0 18.0 .       1 .     e   .   3     . 4   o   5         .         .
21  7.5 19.5 .       1 .    e2   .   3     . 4      o5         .         .
22  7.0 21.0 .       1 .   e 2   .   3     . 4       5 o       .         .
23  6.5 22.5 .       1 .  e  2   .   3     . 4       5    o    .         .
24  8.0 22.0 .       1 .     e   .   3     . 4       5   o     .         .
25  7.5 21.5 .        1.    e  2 .      3  .     4   .  o 5    .         .
26  7.0 21.0 .        1.   e   2 .      3  .     4   . o  5    .         .
27  8.5 20.5 .        1.      e2 .      3  .     4   .o   5    .         .
28  8.0 20.0 .        1.     e 2 .      3  .     4   o    5    .         .
29  7.5 19.5 .        1.    e  2 .      3  .     4  o.    5    .         .
30  7.0 21.0 .        1.   e   2 .      3  .     4   . o  5    .         .
31  6.5 20.5 .         1  e      2         3         4o        5         .
32  6.0 22.0 .         1 e       2         3         4   o     5         .
33  7.5 23.5 .         1    e    2         3         4      o  5         .
34  7.0 25.0 .         1   e     2         3         4         o         .
35  6.5 24.5 .         1  e      2         3         4        o5         .
at   725           25.0 bits

The horizontal axis is in bits, the vertical axis is in bases.  The
numbers are the standard deviations.  With this chart one can determine
the significance of each palindrome.  Clearly there is a strong (nearly
standard deviations) odd palindrome at coordinate 725.

In the x mode, the sequence is given:

1                   2
x bp even  odd<0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8.9.0.1.2.3.4.5.6.7.8
2 a -0.5  1.5e. 1o2 3 4 5         .         .         .         .
3 c -0.5 -0.5=.  1  2  3. 4  5    .         .         .         .
4 a -0.5  3.0e.  1  o  3. 4  5    .         .         .         .
5 g -0.5  1.5e.  o1   2 . 3   4   5         .         .         .
6 t -0.5  1.5e.  o1   2 . 3   4   5         .         .         .
7 a  1.5  1.5 .  = 1    2    3    4    5    .         .         .
8 a -0.5  3.5e.    1 o  2    3    4    5    .         .         .
9 g  0.5 -0.5o.e   1    2    3    4    5    .         .         .
10 a -0.5  1.5e.  o 1    2    3    4    5    .         .         .
11 c -0.5  1.5e.  o  1   . 2     3 .   4     5         .         .
12 g  3.0  1.5 .  o  e   . 2     3 .   4     5         .         .
13 g  6.5 -0.5o.     1   . 2e    3 .   4     5         .         .

Here the horizontal axis is again in bits, but the vertical
axis is the location on the sequence (which is why the bp column
shows the bases).

In the n mode, only a summary of the palindrome locations is provided:

even      odd  palindromes
at   537           21.0 bits
at   547 24.5 bits
at   707           22.5 bits
at   725           25.0 bits
at  1101 21.0 bits
at  1180 24.0 bits
at  1279           21.0 bits
at  1322 24.5 bits

palinfeatures: The locations of palindromes in the features format that
the lister program uses.  Pass these to lister and the palindrome will
be drawn on your sequence listing.

The format that the features are listed is:

define "odd60.K00042" "-" "(((|)))" "(((|)))" -3 -2 -1 0 1 2 3
@ K00042 60 +1 "odd60.K00042" " 4.5 bits"

define "even547.K00042" "-" "((()))" "((()))" -3 -2 -1 0 1 2
@ K00042 547 +1 "even547.K00042" " 4.5 bits"

output: messages to the user.

description

Each piece of the book is searched for imperfect palindromes with
significance determined by the first parameter in palinfp.  There are
two kinds of palindrome: even and odd, refering to the size of the
palindrome in bases.  An odd palindrome will have a central base, while
an even one will not have one.  Method of use:  search without the 'm'
option to pick out sites of interest.  Then use 'm' under 'stringent
conditions' or on a smaller fragement to see the structure of the
palindrome.  The final r value will be the maximum of r values for all
smaller palindromes.  Note: equiprobable compositions are assumed for
e(hnb).

Theory:
When there are a large number of sequences, the information needed to
chose one of the 4 bases is log2(4) = 2 bits.  In contrast, for only
two sequences (n = 2), the information measure is severely biased.
This reflects the statistical likelyhood of finding matches.  One
quarter of the time two randomly chosen bases will match.  In
information theory terms, this means that a match counts only as 0.75
bits (see reference Schneider1986 appendix figure A2).  So, for
example, the restriction site for EcoRI, GAATTC is 6 x 2 = 12 bits when
taken from many examples of the site (as when EcoRI binds).  However,
as a single sequence, it only counts as 6 x 0.75 = 4.5 bits.  This
effect prevents one from identifying spurious palindromes, but it is,
unfortunately, not intuitive.

To avoid duplicate definitions as much as possible, the names now
include the piece name in which the palindrome is found.

examples

The parameters

21 positive: bits minimum to find; negative: st.dev out to find
71 largest size palindrome to find (measured from center to edge in bases)
m  n=indicate detected palindromes; x=show sequence; m=show palindromes
palinfp: parameters for palinf

will locate the E. coli lac operator uniquely in the 401 bases
surrounding the start of the lacZ transcript.

The inverted repeats of pSC101 in GenBank K00042 are located with the
same [13/35/m] parameters at coordinates 707 and 725.  (Other things
are found as well, they have been ignored in the literature because
they don't match the inverted repeats.)

The parameters [4.5/6/n] will locate 6 base palindromes.

documentation

Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
The information content of binding sites on nucleotide sequences.
J. Mol. Biol. 188: 415-431.

Example parameter file: palinfp

Program to display the palindrome features: lister.p

author

Thomas D. Schneider and Karen A. Lewis

bugs

If parameter 2 is very large, spurious sites will be found.

technical notes

Limiting the size of the palindrome will increase the search speed.

*)
(* end module describe.palinf *)
{This manual page was created by makman 1.44}
```