Delila Program: discan

discan program

Documentation for the discan program is below, with links to related programs in the "see also" section.

{   version = 1.30; (* of discan.p 2005 May 23}

(* begin module describe.discan *)
   discan: combine two feature files into one model

   discan(scanfeaturesa: in, scanfeaturesb: in, histog: in, discanp: in,
          discanfeatures: out, dsout: out, output)

   scanfeaturesa:  the scanfeatures file for the first model,
      from program scan.

   scanfeaturesb:  the scanfeatures file for the second model,
      from program scan.

   histog:  output of genhis program, it is the distribution used to compute
      the uncertainty due to various distances between the two models.
      Histog must be in increments of 1 over the range.

      To create the data file for genhis, start with a Delila instruction
      file.  Then use the malign program to get improved alignments.  Use
      malin to extract one of the alignments as a second Delila instruction
      file.  Then use the diffinst program to make the data file.  Finally,
      run genhis go get the histog.

   discanp:  parameters to control the program.  The file must contain the
      following parameters, one per line:

      parameter version number, needs to be compatible with current program

      lowhistog, highhistog (two integers): range of the histog distribution
         to use, in the order lowest value to highest

      ricutoff: total Ri cutoff, this value is compared to the individual
         info of site 1 + the individual info of site 2 + the sample
         correction - the -log2 of the distance probability.

      singleA (character):  If singleA is 'f' then filter out any feature
         pairs that contain the same A coordinate that have lower information
         than another pair.  Note: singleA and singleB cannot be both 'f' at
         the same time.

      singleB (character): If singleB is 'f' then filter out any feature
         pairs that contain the same B coordinate that have lower information
         than another pair.  Note: singleA and singleB cannot be both 'f' at
         the same time.

   discanfeatures: new features file to use with lister.

   dsout:  Table of the first location, second location, distribution and
      total information.  This can be the input (xyin) to the xyplo program.

   output: messages to the user


   This program is used to compare the binding patterns of 2 different
   binding site models, it selects sites that are within a certain range of
   each other and then adds their individual information together and
   subtracts a distance based distribution probability value to determine the
   new total information.

   The theory is that the distances have a distribution.  So one can assign
   probabilities to each distance.  One can compute the uncertainty of such a
   distribution, so one can also compute the individual information
   (surprisal) of each gap - it's just -log2(gap distance frequency) + small
   sample correction.  Ryan's discan program does this.

   What I like about this is it combines the information of the parts
   smoothly with the information about gap distance!  There are *no*
   arbitrary gap penalties or other arbitrary parameters!  Of course, if the
   model fails we are in big trouble ...

   To make the distribution file (histog) and know which model should be A
   and which should be B, use which every (ever zzz?) model was subtracted as
   A and the model subtracted from as A.  (hunh? zzz)  For example, if your
   disribution is negative, you probably subtracted the model further
   downstream from the model upstream, so the downstream model would be A and
   the upstream model would be B.  With ribosome binding sites sd always came
   before atg, atg was subtracted from sd, giving a negative distribution and
   (the atg?  zzz) was then assigned as the A model and sd was assigned as
   the B model.

   One can filter out cases having a common A or common B feature using
   parameters singleA and singleB.  However, the program will not allow one
   to remove both at the same time.  Consider this situation:

   feature   5'         3' along sequence
      1           B---A  5 bits
      2        B------A  4 bits
      3        B---A     3 bits

   Given feature 1, if feature 2 is found next and removed by the A filter,
   then feature 3 would incorrectly survive the B filter.

   The B filter says to keep 2, remove 3.
   The A filter says to keep 1, remove 2.

   This is contradictory, since depending on the order of processing
   different things can happen.  For the order given above,  2 and 3 would
   compete and 3 would loose.  Then 2 would loose to 1.  HOWEVER in a scan of
   the complementary strand 1 would beat 2 and then 3 would pass through.  So
   the results would depend on the direction of the scans.

   To avoid this inconsistancy, one is not allowed to do both filterings at
   the same time.


example discanp file:

1.02    version of discan that this parameter file is designed for.
-12 -6  range
0       total Ri cutoff
f       f means filter out duplicates of the a features
n       f means filter out duplicates of the b features


see also
   scan.p, genhis.p, lister.p, discanp, xyplo.p, diffinst.p, malign.p,

   Ryan Kent Shultzaberger and Tom Schneider

   Discan does not know how to deal with circular segments, but since this
   reads in at the features level, it could be fixed if scan used internal
   coordinates.  However, scan can't produce internal coordinates because the
   features might be used for a different book, as in "forced" features.

technical notes

(* end module describe.discan *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility