Delila Program: zipf

zipf program

Documentation for the zipf program is below, with links to related programs in the "see also" section.

{version = 1.32; (* of zipf.p 1993 January 26}

(* begin module describe.zipf *)
(*
name
   zipf: Monte Carlo simulation for Peter Shenkin's problem

synopsis
   zipf(zipfp: in, data: out, xyin: out, output: out)

files
   zipfp:  parameters to control the program
      first line: integer, number of correlation coefficients to create
      second line: integer, number of symbols for each correlation coefficient.
         eg, 20 means amino acids.
      third line: character.  't' means use Tom's method, 'p' means use Peter's.
      fourth line: character.  'g' means to graph the simplex.
   data:  a list of correlation coefficients.  This is to be input
      to the genhis program.
   xyin:  data for graphing the simplex.  The graph is generated with the
      xyplo program.
   output: messages to the user

description

   1992 Jan 13  Returned call to Stephen Altschul 496-2475.  He suggested that
   Peter Shenkin's results of rank versus log of probability are due to random
   effects.  This is easy to test with a Monte Carlo simulation:

   Tom's method
      chose s (eg 20) random numbers
      find their sum
      divide each number by the sum to produce s random numbers which
         sum to 1.
      sort the numbers
      take the log versus the rank
      determine the correlation coefficient
      repeat to get distribution of correlation coefficients.

   Peter's method
      chose s-1 random numbers between 0 and 1
      sort the numbers
      take the differences to produce 20 numbers that sum to 1
      resort the numbers
      take the log versus the rank
      determine the correlation coefficient
      repeat to get distribution of correlation coefficients.

   Graph of simplex.  The numbers all add to 1 for either method.  They are
   points in an s dimensional space.  The volume they fit into is a hyper plane
   of s-1 dimensions since they sum to 1, called a simplex.  The distribution
   of the points can be visualized by projecting onto a plane and graphing with
   the xyplo program.  The projection is done by using polar coordinates.
   There is a vector P from the center of the simplex to each point to graph.
   There is a vector, A, from the center of the simplex to the point where the
   first coordinate has value 1 and all others are zero.  The magnitude of P is
   determined, and the angle between P and A determines an angle.  These
   numbers are in polar coordinates.  They are converted to rectangular
   coordinates in the xyin file.  If s = 3, then the simplex is a simple plane
   reaching between the three points A=(1,0,0), B=(0,1,0) and C=(0,0,1).  The
   projection takes this equilateral triangle onto the xy plane.  In higher
   dimensions, the points are collapsed to the xy plane, so high dimensional
   effects are expected.  This means that the center should tend to become
   empty, and the distribution will become spherical.

examples

zipfp file:
***********************************************************
10000 10000      1000 Number of correlation coefficients to print out
3 16            Number of symbols being simulated
p             t= tom's, else peter's
g             g = graph the symplex, otherwise not

zipfp:  parameters to control the zipf program.
***********************************************************

genhisp file for use with genhis
***********************************************************
x n 50
r -1 -0.5
***********************************************************

xyplop file for use with xyplo
***********************************************************
2 2       zerox zeroy         graph coordinate center
x -1 1 zx 0 25    zx min max (character, real, real) if zx='x' then set xaxis
y -1 1 zy 0 250   zy min max (character, real, real) if zy='y' then set yaxis
10 10     xinterval yinterval number of intervals on axes to plot
6 6       xwidth    ywidth    width of numbers in characters
1 1       xdecimal  ydecimal  number of decimal places
16.5 22.0 xsize     ysize     size of axes in cm
x
y
c         zc                  'c' crosshairs, axXyYnN
n 2       zxl base            if zxl='l' then make x axis log to the given base
n 2       zyl base            if zyl='l' then make y axis log to the given base
          *********************************************************************
1 2       xcolumn   ycolumn   columns of xyin that determine plot location
0         symbol column       the xyin column to read symbols from
0  0      xscolumn  yscolumn  columns of xyin that determine the symbol size
0 0 0     hue saturation brightness   columns for color manipulation
          *********************************************************************
p         symbol-to-plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
0         symbol-flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
nl 0.05   no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
.
          *********************************************************************
***********************************************************

documentation

see also
   genhis.p, xyplo.p

author
   Thomas Dana Schneider

bugs

technical notes
   The non-standard random number generator is used (rand).
   This could be replaced by a portable one, but with the danger
   of it not giving good results.

*)
(* end module describe.zipf *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility