Delila Program: comp

comp program

Documentation for the comp program is below, with links to related programs in the "see also" section.

{version = 5.27; (* of comp.p 1999 Oct 13}

(* begin module describe.comp *)
(*
name
      comp: determine the composition of a book.

synopsis
      comp(book: in, cmp: out, compp: in, output: out)

files
      book: the sequences;
      cmp: the composition, determined for mononucleotides up to
            oligonucleotides of length "compmax", see file compp;
      compp: parameter file used to set the length of the oligonucleotides for
            which the composition is to be determined ("compmax");  that number
            must be the first thing in the file; if the file is empty
            compmax is set by default to the constant "defcompmax";
      output: for messages to the user.

description
      Comp counts the number of each oligonucleotide (from length 1 to
      compmax) in the book and prints that to file "cmp".  The output is
      printed in order of increasing length of oligonucleotide (i.e., first
      the monos, then the dis, ...).  If there are no occurences of an
      oligonucleotide, but its one-shorter parent did occur, it will be given
      a zero.  None of its descendants will be printed in the composition
      file.

examples

   As an example of the output format, the composition to depth 3 of E. coli
   (U00096, 16-OCT-1997) is:

 comp 5.27: composition of
 * 1999/05/04 14:41:13, 1999/05/04 14:38:08, dbbk 3.33
 3 is the longest oligo counted

 *
 0-long oligos (the total number of bases)
          4639221
 *
 1-long oligos
      a   1142136      c   1179433      g   1176775      t   1140877
 *
 2-long oligos
      aa   337835      ac   256658      ag   237851      at   309792
      ca   325118      cc   271649      cg   346636      ct   236029
      ga   267234      gc   383865      gg   270083      gt   255593
      ta   211948      tc   267261      tg   322205      tt   339463
 *
 3-long oligos
      aaa  108901      aac   82578      aag   63364      aat   82992
      aca   58633      acc   74899      acg   73263      act   49863
      aga   56618      agc   80848      agg   50611      agt   49774
      ata   63692      atc   86476      atg   76229      att   83395
      caa   76607      cac   66752      cag  104785      cat   76974
      cca   86442      ccc   47764      ccg   87031      cct   50412
      cga   70934      cgc  115673      cgg   86870      cgt   73159
      cta   26762      ctc   42714      ctg  102900      ctt   63653
      gaa   83490      gac   54737      gag   42460      gat   86547
      gca   96010      gcc   92961      gcg  114609      gct   80285
      gga   56199      ggc   92123      ggg   47470      ggt   74291
      gta   52670      gtc   54225      gtg   66108      gtt   82590
      taa   68837      tac   52591      tag   27241      tat   63279
      tca   84033      tcc   56025      tcg   71733      tct   55469
      tga   83483      tgc   95221      tgg   85132      tgt   58369
      tta   68824      ttc   83846      ttg   76968      ttt  109825

see also
      compan.p, histan.p, markov.p

authors
      Gary Stormo and Tom Schneider

bugs
      none known

technical note
      The algorithm is an interesting application of linked lists.  The
      composition is stored as a tree, and a number of "spiders" climb the
      tree during its construction.

*)
(* end module describe.comp *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility