Delila Program: dalvec

dalvec program

Documentation for the dalvec program is below, with links to related programs in the "see also" section.

{version = 2.20; (* of dalvec.p 1995 June 24}

(* begin module describe.dalvec *)
(*
name
   dalvec: converts Rseq rsdata file to symvec format

synopsis
   dalvec(rsdata: in, dalvecp: in, symvec: out, output: out)

files
   rsdata: data file from rseq program

   dalvecp: parameters to control dalvec
      If empty, then the normal sequence logo will be produced.
      If the first character of the first line is a 'c', then a chi-logo
      is produced.  The height of this logo is the information.  The
      heights of the individual letters are, however, not the frequencies,
      but rather their partial chi-square values.  The expected value
      is 1/4 of the number of characters.  This is compared to the observed
      value by:
        partial chi-square =(observed - expected)^2/expected
      These partial values are normalized and placed in symvec in place of
      the relative frequencies.  Thus the significance of each letter is
      used.  When the observed is less than expected, the reported value
      is made negative.  Makelogo prints these characters upside down.
   symvec: reformating of the rsdata file for input to the makelogo program.
      A series of header lines begining with asterisk ("*") are produced.
      The next line contains one integer which is the number of symbols
      in the vector (4 for DNA or RNA, 20 for proteins).
      After this, the format of the file is a series of entries.  Each entry
      has two parts.  The first part is on one line and contains
         position total information
         position: the position number
         total:  the sum of the values in the vector
         information: the information content of the vector.
      The remaining parameters on the line are from the rsdata file:
         rs: sum of rsl
         varhnb: variance of rsl
         sumvar: sum of varhnb
         ehnb: 2-e(n)

      The second part consists of a list of a series of symbol lines.  The
      number of these matches the numer of symbols (4 in the case of DNA),
      representing the the numbers of bases or amino acids at the position in
      an aligned set of sequences.  Each line begins with the character of the
      symbol, followed by the number of that symbols.

   output: messages to the user

description
   Convert the rsdata file from rseq into a format that the makelogo program
   can use.  The format is a 'symbol vector'.

   ChiLogos: If you leave the parameter file empty, then the standard sequence
   logo will be created.  However, if the first letter of the file is a 'c',
   then a new kind of logo will emerge from makelogo: the chi-logo.  The height
   is as it was before.  The height of the individual letters is different,
   instead of being proportional to the frequency of the letter, it is
   proportional to the significance of the letter, based on the chi-square
   test.  That is, the expected number of letters is the number of letters at
   that position, n(l) divided by 4 (for simplicity!).  The observed number
   comes from the rsdata file.  The partial-chi square is
   (observed-expected)^2/expected.  Note that the sum of the partials is the
   normal chi-square.  So bases that contribute strongly get big.  Also, bases
   that are under represented are printed UPSIDE DOWN, so you can (usually)
   tell you have a chilogo at a glance.  The chilogo allows one to see the
   importance of the infrequent letters.  The technical mechanism for making a
   letter upside down is to have its number negative in the symvec file.

examples

see also
   rseq.p, makelogo.p

author
   Thomas D. Schneider

bugs

   The program originally only created a vector that contained the characters
   of the alphabet, so the output was called an 'alvec'.  To reflect the use of
   symbols, the name of the output file was changed to symvec, but I like
   'dalvec', and 'dsymvec' is so awkward that I decided to keep the name
   dalvec.

*)
(* end module describe.dalvec *)
{This manual page was created by makman 1.45}

{created by htmlink 1.62}
U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA