Delila Program: zipf

# zipf program

## Pascal source code: zipf.p (wget instructions) Instructions on compiling Alphabetic List of Delila Programs Delila Programs by Most Recent Update Please report broken links Copyright Statement for Delila Programs

### Documentation for the zipf program is below, with links to related programs in the "see also" section.

```{version = 1.32; (* of zipf.p 1993 January 26}

(* begin module describe.zipf *)
(*
name
zipf: Monte Carlo simulation for Peter Shenkin's problem

synopsis
zipf(zipfp: in, data: out, xyin: out, output: out)

files
zipfp:  parameters to control the program
first line: integer, number of correlation coefficients to create
second line: integer, number of symbols for each correlation coefficient.
eg, 20 means amino acids.
third line: character.  't' means use Tom's method, 'p' means use Peter's.
fourth line: character.  'g' means to graph the simplex.
data:  a list of correlation coefficients.  This is to be input
to the genhis program.
xyin:  data for graphing the simplex.  The graph is generated with the
xyplo program.
output: messages to the user

description

1992 Jan 13  Returned call to Stephen Altschul 496-2475.  He suggested that
Peter Shenkin's results of rank versus log of probability are due to random
effects.  This is easy to test with a Monte Carlo simulation:

Tom's method
chose s (eg 20) random numbers
find their sum
divide each number by the sum to produce s random numbers which
sum to 1.
sort the numbers
take the log versus the rank
determine the correlation coefficient
repeat to get distribution of correlation coefficients.

Peter's method
chose s-1 random numbers between 0 and 1
sort the numbers
take the differences to produce 20 numbers that sum to 1
resort the numbers
take the log versus the rank
determine the correlation coefficient
repeat to get distribution of correlation coefficients.

Graph of simplex.  The numbers all add to 1 for either method.  They are
points in an s dimensional space.  The volume they fit into is a hyper plane
of s-1 dimensions since they sum to 1, called a simplex.  The distribution
of the points can be visualized by projecting onto a plane and graphing with
the xyplo program.  The projection is done by using polar coordinates.
There is a vector P from the center of the simplex to each point to graph.
There is a vector, A, from the center of the simplex to the point where the
first coordinate has value 1 and all others are zero.  The magnitude of P is
determined, and the angle between P and A determines an angle.  These
numbers are in polar coordinates.  They are converted to rectangular
coordinates in the xyin file.  If s = 3, then the simplex is a simple plane
reaching between the three points A=(1,0,0), B=(0,1,0) and C=(0,0,1).  The
projection takes this equilateral triangle onto the xy plane.  In higher
dimensions, the points are collapsed to the xy plane, so high dimensional
effects are expected.  This means that the center should tend to become
empty, and the distribution will become spherical.

examples

zipfp file:
***********************************************************
10000 10000      1000 Number of correlation coefficients to print out
3 16            Number of symbols being simulated
p             t= tom's, else peter's
g             g = graph the symplex, otherwise not

zipfp:  parameters to control the zipf program.
***********************************************************

genhisp file for use with genhis
***********************************************************
x n 50
r -1 -0.5
***********************************************************

xyplop file for use with xyplo
***********************************************************
2 2       zerox zeroy         graph coordinate center
x -1 1 zx 0 25    zx min max (character, real, real) if zx='x' then set xaxis
y -1 1 zy 0 250   zy min max (character, real, real) if zy='y' then set yaxis
10 10     xinterval yinterval number of intervals on axes to plot
6 6       xwidth    ywidth    width of numbers in characters
1 1       xdecimal  ydecimal  number of decimal places
16.5 22.0 xsize     ysize     size of axes in cm
x
y
c         zc                  'c' crosshairs, axXyYnN
n 2       zxl base            if zxl='l' then make x axis log to the given base
n 2       zyl base            if zyl='l' then make y axis log to the given base
*********************************************************************
1 2       xcolumn   ycolumn   columns of xyin that determine plot location
0         symbol column       the xyin column to read symbols from
0  0      xscolumn  yscolumn  columns of xyin that determine the symbol size
0 0 0     hue saturation brightness   columns for color manipulation
*********************************************************************
p         symbol-to-plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
0         symbol-flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
nl 0.05   no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
*********************************************************************
.
*********************************************************************
***********************************************************

documentation

genhis.p, xyplo.p

author
Thomas Dana Schneider

bugs

technical notes
The non-standard random number generator is used (rand).
This could be replaced by a portable one, but with the danger
of it not giving good results.

*)
(* end module describe.zipf *)
{This manual page was created by makman 1.44}
```