Delila Program: scan

# scan program

## This program is part of the Individual Information theory software package. The patent is now expired and the program is available. By downloading this code you agree to the Source Code Use License (PDF). Pascal source code: scan.p (wget instructions) Instructions on compiling MacOS binary: scan Alphabetic List of Delila Programs Delila Programs by Most Recent Update Please report broken links delilabundle.zip = All Programs and MacOS Binaries Copyright Statement for Delila Programs

### Documentation for the scan program is below, with links to related programs in the "see also" section.

```{   version = 3.66; (* of scan.p 2017 Aug 07}

(* begin module describe.scan *)
(*
name
scan: scan a book with an Ribl weight matrix and generate a vector

synopsis
scan(book: in, ribl: in, scanp: in,
data: out, scanfeatures: out, scaninst: out, output: out)

files

book: a book from the delila system

ribl: a weight matrix from sites or ri programs.
Lines that start with * are notes.  the next line contains the matrix
FROM-TO coordinates, this is followed by the matrix in the order A, C, G,
T from FROM to TO.

scanp: parameters to control the program.

parameterversion: the version number of the program.  This allows the
user to be warned if an old parameter file is used.

seqs: One integer on the first line is the number of sequences to scan
to produce the vector.  0 = none, positive = that number; negative =
all.

Ri range : Two real numbers on the second line give the range of
information content to report in the data file.

Z score range: Two real numbers on the third line give the range of the
Z score to report in the data file.  A negative sign will be
converted to a positive sign so that this parameter limits the range
of acceptable sites to an interval on the real line.  Note: normally
one would want the lower number to be zero.

Probability range: Two real numbers on the fourth line give the range
of probability to report in the data file.  The probability of a
site is determined from the mean and standard deviation of the Ri
distribution.  Note: normally one would want the lower number to be
zero.

fromwanted towanted range: two integers that define the FROM-TO range
of the ribl matrix to use for computations.  This is independent of
the range displayed in the walker.

ways:  One integer.  2 means scan both the sequence and its
complement.  1 means simply scan the sequence.  0 means to let the
program figure it out.  The Ri program determines the symmetry of
the matrix.  If it is symmetrical, it will only scan one way.  If it
is asymmetrical, both scans are done.

sitedefinition:  If the first non-blank character on the line is 'd',
then the rest of the line contains a definition of how to write out
the sites.  If no site is defined, the scanfeatures file will not be
written to.  See program lister.p for details.  The basic format for
an ASCII definition looks like this:

define "Fis" "-" "[0]" "[0]" -7  0 +7

For a walker it looks like:

define "Fis" " w" "  " "  " -7 +7

NOTE: the range for walker display (given in this site definition)
is independent of the range of the weight matrix used for
computation (given in the fromwanted and towanted parameters).

print definitions:  Any number of lines that define how to print the
"other" feature string in each feature definition.  The data that may
be printed are the same as those in the data file.  They are:

#           width
length      width
name        width
coordinate  width
orientation width
Ri          width decimal
Z           width decimal
probability width decimal
string      "quote string"
.           end of print definitions

If the first character on a line is '#', the line defines the
width for the coordinate of the number of the DNA piece from the book.

If the first character on a line is 'l' or 'L', the line defines the
width for the length of the DNA piece in the book.

If the first character on a line is 'n' or 'N', the line defines the
width for the name of the DNA piece in the book.

If the first character on a line is 'f' or 'F', the line defines the
width for the fullname of the DNA piece in the book.

If the first character on a line is 'c' or 'C', the line defines the
width for the coordinate of the zero base of the site.

If the first character on a line is 'o' or 'O', the line defines the
width for the orientation of the site.  If the width is 1, the
orientation is given as + or -, if ithe width is larger the
orientation is given as -1 or +1.

If the first character on a line is 'r' or 'R', the line defines the
width and decimal fields for the individual information in bits.  The
word "bits" is attached to the end of the string.

If the first character on a line is 'p' or 'P', the line defines the
width and decimal fields for the probability of the site.

If the first character on a line is 'z' or 'Z', the line defines the
width and decimal fields for the Z score of the site.

If the first character is 's' or 'S' then the line defines a string to
insert.

The end of the file or a period "." ends the print definitions.

The lines may be put in any order and this defines the order that they
will be printed to the "other" string.  If the first character is not
found (as, for example by having a blank in front of it), the
corresponding data will not be printed.  This gives the user full
control of the "other" string contents.

The only kind of definition that may be repeated is the "string".
This allows the user to put whatever they desire between the data
items.

file output definitions:  The first three characters on the line define
which files will be output.  Capital characters turn on the output.
Small characters turn it off.  The files are data, (scan)features,
and (scan)inst so the characters are d, f and i, respectively.  Thus
DfI turns on the data and scaninst files and leaves the scanfeatures
off.  (Unidentified characters default to upper case.)

normalizeRi:  The first character is defines how to normalize
the reported Ri values.  The Ri value at coordinate zero
is called Ri0.

n: normal: scan and report Ri

s: subtract: compute Ri(l) - Ri(0) at each position l

d: divide: compute Ri(l) / Ri(0) at each position l

The s and d modes are usually to be used in conjunction with
renumbering by Delila (the 'default coordinate zero' command).

instfrom, instto: range of Delila instructions produced in scaninst
if that file is created.

data: The results.  Comments are lines that begin with '*'.  The columns are
defined in comments in the file.  The matrix is searched over both the
sequence and its complement.  Ri is reported, as is the Z and probability
based on the mean and st.dev.

scanfeatures: The results in the "features" format for input to the
lister program.  This consists of comment lines (beginning with "*"),
definition lines (as shown above), and features of the form:

@ K01789 229.0 -1 "dnaA" "+12.2 bits " 12.200338    -0.473212     0.318031

See program lister.p for details.

scaninst:  The results are given in the form of delila instructions:
name "dnaA"; piece K01789; get from 229 -100 to 229 +100 direction -;

output: messages to the user

description
The Ri(b,l) weight matrix is scanned across the sequences in the book to
produce a vector.

examples

Example scanp files:

3.00    version of scan that this parameter file is designed for.
-1      number of seqs to scan 0 = none, positive = that number; negative = all
0       information content at or above which to report in the data file.
100     Z score below which to report in the data file
0       probability at or above which to report in the data file.
-10 +10 desired region of the ribl weight matrix to use
0       0: program figures it out; 1: one way scan; 2: two way scan.
define "Fis" "-" "[0]" "[0]" -7  0 +7
string "data at:" string: A string listed at the feature
coordinate 5
string " Ri = "   string: A string listed at the feature
Ri 5 1  Riwidth Ridecimal: character places for reporting bits to scanfeatures
string " Z = "    string: A string listed at the feature
Z 4 1   z score
string " p = "    string: A string listed at the feature
probability 5 2
.       end of print definitions
DFI     dfi: data, features, inst: files output
n       normalizeRi: n: normal, s: Ri(l)-Ri(0), d: Ri(l)/Ri(0)
-50 +50 instfrom, instto: range to make the scaninst file (if made)
scanp: parameters to control the program.

3.00    version of scan that this parameter file is designed for.
-1      number of seqs to scan 0 = none, positive = that number; negative = all
0   100 information content at or above which to report in the data file.
0   100 Z score below which to report in the data file
0   1   probability at or above which to report in the data file.
-10 +10 desired region of the ribl weight matrix to use
1       0: program figures it out; 1: one way scan; 2: two way scan.
define "Fis" " w" "  " "  " -10 10
string "@"
coordinate 4
string "|Ri="
Ri 4 1  Riwidth Ridecimal: character places for reporting bits to scanfeatures
string " bits"
string "|Z="
Z 7 4   z score
string "|p="
probability 5 3
.  end of print definitions
dFi     dfi: data, features, inst: files output
n       normalizeRi: n: normal, s: Ri(l)-Ri(0), d: Ri(l)/Ri(0)
-50 +50 range to make the scaninst file (if made)

documentation

@article{Schneider.Ri,
author = "T. D. Schneider",
title = "Information Content of Individual Genetic Sequences",
journal = "J. Theor. Biol.",
volume = "189",
number = "4",
pages = "427-441",
note = "http://www-lecb.ncifcrf.gov/\$\sim\$toms/paper/ri/",
comment = "indiv.tex",
comment = "Submitted, April 1997",
year = "1997"}

@article{Schneider.walker,
author = "T. D. Schneider",
title = "Sequence Walkers:
a graphical method to display how binding proteins
interact with {DNA} or {RNA} sequences",
journal = "Nucl. Acids Res.",
volume = "25",
comment = "walker.tex, November 1, issue 21",
note = "http://www-lecb.ncifcrf.gov/\$\sim\$toms/paper/walker/,
erratum: NAR 26(4): 1135, 1998",
pages = "4408-4415",
year = "1997"}

sites.p, ri.p, genhis.p, lister.p, dnaplot.p

author
Thomas Dana Schneider

bugs
* The quote strings in the parameter file are not recorded and so are not
reproduced in the data file comments.
* Blank characters are placed around the quote strings.

technical notes
The mean and standard deviation of the Ri distribution are stored just
after the Ri(b,l) table in the ribl file.  They are produced automatically
by the ri program.

To provide upwards compatability, scanp files of version 2.90 or less will
be interpreted by the old definitions for the bounds of Ri, Z and p:

Ri cutoff : One real on the second line is the information content at
or above which to report in the data file.

Z score cutoff: One real on the third line is the Z score at or below
which to report in the data file.  A negative sign will be converted to
a positive sign so that this parameter limits the range of acceptable
sites to an interval on the real line.

Probability cutoff: One real on the fourth line is the lowest
probability which to report in the data file.  The probability of a
site is determined from the mean and standard deviation of the Ri
distribution.

It is not advisable to rely on this feature, as it will go away at some
point.

*)
(* end module describe.scan *)
{This manual page was created by makman 1.45}

```