- This program is part of the Individual Information theory software package. To obtain it, please .
- Alphabetic List of Delila Programs
- Delila Programs by Most Recent Update
- Please report broken links
- Copyright Statement for Delila Programs

**
**

{ version = 1.32; (* of biscan.p 2007 Mar 21} (* begin module describe.biscan *) (* name biscan: multiple part scanning program synopsis program biscan(ribla: in, riblb: in, scanpa: in, scanpb: in, histog: in, book: in, biscanp: in, scanfeatures: out, data: out, scaninst: out, output: out); files book: a book from the delila system ribla and riblb: a weight matrix from sites or ri programs. Lines that start with * are notes. the next line contains the matrix FROM-TO coordinates, this is followed by the matrix in the order A, C, G, T from FROM to TO. Ribla is the ribl for model A and riblb is the ribl for model B. scanpa and scanpb: parameters to control the program. parameterversion: the version number of the program. This allows the user to be warned if an old parameter file is used. seqs: One integer on the first line is the number of sequences to scan to produce the vector. 0 = none, positive = that number; negative = all. Ri range : Two real numbers on the second line give the range of information content to report in the data file. Z score range: Two real numbers on the third line give the range of the Z score to report in the data file. A negative sign will be converted to a positive sign so that this parameter limits the range of acceptable sites to an interval on the real line. Note: normally one would want the lower number to be zero. Probability range: Two real numbers on the fourth line give the range of probability to report in the data file. The probability of a site is determined from the mean and standard deviation of the Ri distribution. Note: normally one would want the lower number to be zero. fromwanted towanted range: two integers that define the FROM-TO range of the ribl matrix to use for computations. This is independent of the range displayed in the walker. ways: One integer. 2 means scan both the sequence and its complement. 1 means simply scan the sequence. 0 means to let the program figure it out. The Ri program determines the symmetry of the matrix. If it is symmetrical, it will only scan one way. If it is asymmetrical, both scans are done. sitedefinition: If the first non-blank character on the line is 'd', then the rest of the line contains a definition of how to write out the sites. If no site is defined, the scanfeatures file will not be written to. See program lister.p for details. The basic format for an ASCII definition looks like this: define "Fis" "-" "[0]" "[0]" -7 0 +7 For a walker it looks like: define "Fis" " w" " " " " -7 +7 NOTE: the range for walker display (given in this site definition) is independent of the range of the weight matrix used for computation (given in the fromwanted and towanted parameters). print definitions: Any number of lines that define how to print the "other" feature string in each feature definition. The data that may be printed are the same as those in the data file. They are: # width length width name width coordinate width orientation width Ri width decimal Z width decimal probability width decimal string "quote string" . end of print definitions If the first character on a line is '#', the line defines the width for the coordinate of the number of the DNA piece from the book. If the first character on a line is 'l' or 'L', the line defines the width for the length of the DNA piece in the book. If the first character on a line is 'n' or 'N', the line defines the width for the name of the DNA piece in the book. If the first character on a line is 'c' or 'C', the line defines the width for the coordinate of the zero base of the site. If the first character on a line is 'o' or 'O', the line defines the width for the orientation of the site. If the width is 1, the orientation is given as + or -, if ithe width is larger the orientation is given as -1 or +1. If the first character on a line is 'r' or 'R', the line defines the width and decimal fields for the individual information in bits. The word "bits" is attached to the end of the string. If the first character on a line is 'p' or 'P', the line defines the width and decimal fields for the probability of the site. If the first character on a line is 'z' or 'Z', the line defines the width and decimal fields for the Z score of the site. If the first character is 's' or 'S' then the line defines a string to insert. The end of the file or a period "." ends the print definitions. The lines may be put in any order and this defines the order that they will be printed to the "other" string. If the first character is not found (as, for example by having a blank in front of it), the corresponding data will not be printed. This gives the user full control of the "other" string contents. The only kind of definition that may be repeated is the "string". This allows the user to put whatever they desire between the data items. file output definitions: The first three characters on the line define which files will be output. Capital characters turn on the output. Small characters turn it off. The files are data, (scan)features, and (scan)inst so the characters are d, f and i, respectively. Thus DfI turns on the data and scaninst files and leaves the scanfeatures off. (Unidentified characters default to upper case.) normalizeRi: The first character is defines how to normalize the reported Ri values. The Ri value at coordinate zero is called Ri0. n: normal: scan and report Ri s: subtract: compute Ri(l) - Ri(0) at each position l d: divide: compute Ri(l) / Ri(0) at each position l The s and d modes are usually to be used in conjunction with renumbering by Delila (the 'default coordinate zero' command). instfrom, instto: range of Delila instructions produced in scaninst if that file is created. histog: output of genhis program, it is the distribution used to compute the uncertainty due to various distances between the two models. Histog must be in increments of 1 over the range. To create the data file for genhis, start with a Delila instruction file. Then use the malign program to get improved alignments. Use malin to extract one of the alignments as a second Delila instruction file. Then use the diffinst program to make the data file. Finally, run genhis to get the histog. biscanp: parameters to control the program. The file must contain the following parameters, one per line: parameter version number, needs to be compatible with current program version lowhistog, highhistog (two integers): range of the histog distribution to use, in the order lowest value to highest ricutoff: total Ri cutoff, this value is compared to the individual info of site 1 + the individual info of site 2 + the sample correction - the -log2 of the distance probability. singleA (character): If singleA is 'f' then filter out any feature pairs that contain the same A coordinate that have lower information than another pair. Note: singleA and singleB cannot be both 'f' at the same time. data: The results. Comments are lines that begin with '*'. The columns are defined in comments in the file. The matrix is searched over both the sequence and its complement. Ri is reported, as is the Z and probability based on the mean and st.dev. scanfeatures: The results in the "features" format for input to the lister program. This consists of comment lines (beginning with "*"), definition lines (as shown above), and features of the form: @ K01789 229.0 -1 "dnaA" "+12.2 bits " 12.200338 -0.473212 0.318031 See program lister.p for details. scaninst: The results are given in the form of delila instructions: name "dnaA"; piece K01789; get from 229 -100 to 229 +100 direction -; output: messages to the user description This program is used to compare the binding patterns of 2 different binding site models, it selects sites that are within a certain range of each other and then adds their individual information together and subtracts a distance based distribution probability value to determine the new total information. The theory is that the distances have a distribution. So one can assign probabilities to each distance. One can compute the uncertainty of such a distribution, so one can also compute the individual information (surprisal) of each gap - it's just -log2(gap distance frequency) + small sample correction. Ryan's discan program does this. What I like about this is it combines the information of the parts smoothly with the information about gap distance! There are *no* arbitrary gap penalties or other arbitrary parameters! Of course, if the model fails we are in big trouble ... To make the distribution file (histog) and know which model should be A and which should be B, use whichever model was subtracted as A and the model subtracted from as B. For example, if your disribution is negative, you probably subtracted the model further downstream from the model upstream, so the downstream model would be A and the upstream model would be B. With ribosome binding sites sd always came before atg, aug was subtracted from sd, giving a negative distribution and was then assigned as the A model and sd was assigned as the B model. examples An example scanp file: -1 number of seqs to scan 0 = none, positive = that number; negative = all 0 information content at or above which to report in the data file. 100 Z score below which to report in the data file 0 probability at or above which to report in the data file. -10 +10 desired region of the ribl weight matrix to use 0 0: program figures it out; 1: one way scan; 2: two way scan. define "Fis" "-" "[0]" "[0]" -7 0 +7 string "data at:" string: A string listed at the feature coordinate 5 string " Ri = " string: A string listed at the feature Ri 5 1 Riwidth Ridecimal: character places for reporting bits to scanfeatures string " Z = " string: A string listed at the feature Z 4 1 z score string " p = " string: A string listed at the feature probability 5 2 . end of print definitions DFI dfi: data, features, inst: files output n normalizeRi: n: normal, s: Ri(l)-Ri(0), d: Ri(l)/Ri(0) -50 +50 instfrom, instto: range to make the scaninst file (if made) scanp: parameters to control the program. An example biscanp file: 1.00 version of discan that this parameter file is designed for. -18 -4 range 4 total Ri cutoff f f means filter out duplicates of the features documentation @article{Schneider.Ri, author = "T. D. Schneider", title = "Information Content of Individual Genetic Sequences", journal = "J. Theor. Biol.", volume = "189", number = "4", pages = "427-441", note = "http://www.lecb.ncifcrf.gov/$\sim$toms/paper/ri/", comment = "indiv.tex", comment = "Submitted, April 1997", year = "1997"} @article{Schneider.walker, author = "T. D. Schneider", title = "Sequence Walkers: a graphical method to display how binding proteins interact with {DNA} or {RNA} sequences", journal = "Nucl. Acids Res.", volume = "25", comment = "walker.tex, November 1, issue 21", note = "http://www.lecb.ncifcrf.gov/$\sim$toms/paper/walker/, erratum: NAR 26(4): 1135, 1998", pages = "4408-4415", year = "1997"} see also sites.p, ri.p, genhis.p, lister.p, dnaplot.p author Ryan Shultzaberger stealing Thomas Schneider's scan.p code bugs * The quote strings in the parameter file are not recorded and so are not reproduced in the data file comments. * Blank characters are placed around the quote strings. * Complimentary scans should work, but I haven't tested them completely. The A model can be scanned both ways, but b is fixed. It is not clear that this works. * If sites have an even symmetry then there is a problem. (need to use the riblarray^.symmetry parameter to fix the problem) technical notes The mean and standard deviation of the Ri distribution are stored just after the Ri(b,l) table in the ribl file. They are produced automatically by the ri program. To provide upwards compatability, scanp files of version 2.90 or less will be interpreted by the old definitions for the bounds of Ri, Z and p: Ri cutoff : One real on the second line is the information content at or above which to report in the data file. Z score cutoff: One real on the third line is the Z score at or below which to report in the data file. A negative sign will be converted to a positive sign so that this parameter limits the range of acceptable sites to an interval on the real line. Probability cutoff: One real on the fourth line is the lowest probability which to report in the data file. The probability of a site is determined from the mean and standard deviation of the Ri distribution. It is not advisable to rely on this feature, as it will go away at some point. *) (* end module describe.biscan *) {This manual page was created by makman 1.44}

Viewing Files Accessibility