By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.33; (* of zipf.p 2016 January 26} (* begin module describe.zipf *) (* name zipf: Monte Carlo simulation for Peter Shenkin's problem synopsis zipf(zipfp: in, data: out, xyin: out, output: out) files zipfp: parameters to control the program first line: integer, number of correlation coefficients to create second line: integer, number of symbols for each correlation coefficient. eg, 20 means amino acids. third line: character. 't' means use Tom's method, 'p' means use Peter's. fourth line: character. 'g' means to graph the simplex. data: a list of correlation coefficients. This is to be input to the genhis program. xyin: data for graphing the simplex. The graph is generated with the xyplo program. output: messages to the user description 1992 Jan 13 Returned call to Stephen Altschul 496-2475. He suggested that Peter Shenkin's results of rank versus log of probability are due to random effects. This is easy to test with a Monte Carlo simulation: Tom's method chose s (eg 20) random numbers find their sum divide each number by the sum to produce s random numbers which sum to 1. sort the numbers take the log versus the rank determine the correlation coefficient repeat to get distribution of correlation coefficients. Peter's method chose s-1 random numbers between 0 and 1 sort the numbers take the differences to produce 20 numbers that sum to 1 resort the numbers take the log versus the rank determine the correlation coefficient repeat to get distribution of correlation coefficients. Graph of simplex. The numbers all add to 1 for either method. They are points in an s dimensional space. The volume they fit into is a hyper plane of s-1 dimensions since they sum to 1, called a simplex. The distribution of the points can be visualized by projecting onto a plane and graphing with the xyplo program. The projection is done by using polar coordinates. There is a vector P from the center of the simplex to each point to graph. There is a vector, A, from the center of the simplex to the point where the first coordinate has value 1 and all others are zero. The magnitude of P is determined, and the angle between P and A determines an angle. These numbers are in polar coordinates. They are converted to rectangular coordinates in the xyin file. If s = 3, then the simplex is a simple plane reaching between the three points A=(1,0,0), B=(0,1,0) and C=(0,0,1). The projection takes this equilateral triangle onto the xy plane. In higher dimensions, the points are collapsed to the xy plane, so high dimensional effects are expected. This means that the center should tend to become empty, and the distribution will become spherical. examples zipfp file: *********************************************************** 10000 10000 1000 Number of correlation coefficients to print out 3 16 Number of symbols being simulated p t= tom's, else peter's g g = graph the symplex, otherwise not zipfp: parameters to control the zipf program. *********************************************************** genhisp file for use with genhis *********************************************************** x n 50 r -1 -0.5 *********************************************************** xyplop file for use with xyplo *********************************************************** 2 2 zerox zeroy graph coordinate center x -1 1 zx 0 25 zx min max (character, real, real) if zx='x' then set xaxis y -1 1 zy 0 250 zy min max (character, real, real) if zy='y' then set yaxis 10 10 xinterval yinterval number of intervals on axes to plot 6 6 xwidth ywidth width of numbers in characters 1 1 xdecimal ydecimal number of decimal places 16.5 22.0 xsize ysize size of axes in cm x y c zc 'c' crosshairs, axXyYnN n 2 zxl base if zxl='l' then make x axis log to the given base n 2 zyl base if zyl='l' then make y axis log to the given base ********************************************************************* 1 2 xcolumn ycolumn columns of xyin that determine plot location 0 symbol column the xyin column to read symbols from 0 0 xscolumn yscolumn columns of xyin that determine the symbol size 0 0 0 hue saturation brightness columns for color manipulation ********************************************************************* p symbol-to-plot c(circle)bd(dotted box)x+Ifgpr(rectangle) 0 symbol-flag character in xyin that indicates that this symbol 0.05 symbol sizex side in inches on the x axis of the symbol. 0.05 symbol sizey as for the x axis, get size from yscolumn nl 0.05 no connection (example for connection is c- 0.05 for dashed 0.05 inch) n 0.05 linetype size linetype l.-in and size of dashes or dots ********************************************************************* . ********************************************************************* *********************************************************** documentation see also genhis.p, xyplo.p author Thomas Dana Schneider bugs The revision 2016 Jan 26 replaced the non-standard random(0) with a standard proceedure. This will always give the same results. For actual use, add parameters and the timeseed function to base the initial seed on the date and time. technical notes This was replaced by a portable one, but with the danger of it not giving good results. *) (* end module describe.zipf *) {This manual page was created by makman 1.45}{created by htmlink 1.62}