Delila Manual, Hypertext version

This link brings you to the Delila page

This link brings you to the Schneider Lab


version

version = 5.06 of delman1    2014 Mar 06








ddddddd   eeeeeeee  ll        m      m     aa     n     nn
dd    dd  ee        ll        mm    mm    aaaa    nn    nn
dd    dd  ee        ll        mmm  mmm   aa  aa   nnn   nn
dd    dd  eeeeeee   ll        mmmmmmmm  aa    aa  nnnn  nn
dd    dd  ee        ll        mm mm mm  aa    aa  nn nn nn
dd    dd  ee        ll        mm    mm  aaaaaaaa  nn  nnnn
dd    dd  ee        ll        mm    mm  aa    aa  nn   nnn
dd    dd  ee        ll        mm    mm  aa    aa  nn    nn
ddddddd   eeeeeeee  llllllll  mm    mm  aa    aa  nn    nn


                       11
                      111
                       11
                       11
                       11
                       11
                       11
                       11
                    11111111


                         THE DELILA SYSTEM MANUAL

                         THOMAS D. SCHNEIDER
                         COPYRIGHT (C) 1993




1. Don't Panic!  You don't have to absorb this all at once!
2. There is an index at the end of any printed copy of Delman!
3. To create Delman2, see file aa.p

(end of version)

delman.intro





















 IIIIIIII  N     NN  TTTTTTTT  RRRRRRR    OOOOO
    II     NN    NN     TT     RR    RR  OO    OO
    II     NNN   NN     TT     RR    RR  OO    OO
    II     NNNN  NN     TT     RR    RR  OO    OO
    II     NN NN NN     TT     RR    RR  OO    OO
    II     NN  NNNN     TT     RRRRRRR   OO    OO
    II     NN   NNN     TT     RR  RR    OO    OO
    II     NN    NN     TT     RR   RR   OO    OO
 IIIIIIII  NN    NN     TT     RR    RR    OOOOO























(end of delman.intro)

delman.intro.outline.1


 DELILA SYSTEM MANUAL OUTLINE

 INTRO: Introduction To The Delila System
      OUTLINE: Outline For The Delila Manual
      DESCRIPTION: What Is The Delila System?
      ORGANIZATION: Organization Of The Manual
      POLICY: Our Policies, A Disclaimer, Obtaining The Delila System,
              Our Address And Acknowledgements

 TRANSPORT: Transportation Of The Delila System
      REQUIREMENTS: What You Will Need To Get The Delila System Running
      TAPE.FORMATS: Tape Data Formats

 ASSEMBLY: Assembly Of The Delila System Programs
      INTRO: What We Mean By Assembly
      CHACHA: Changing Characters And Getting The First Program Running
      REMBLA: Removing Excess Blanks From Files
      WORCHA: The Reserved Word Problem
      MODULE: Module Libraries - What They Are And How To Use Them
      EXAMPLE: An Example Of Constructing A Delila System Program
      PROBLEMS: Problems That May Arise During Assembly

 GUIDE: Hello, Computer - A Guide To The New User
      INTRO: Introduction To The Guide And Your Computer
      ADVICE: Advice And Tips To The New User
      DELILA: How To Use The Delila System On Your Computer

 PROGRAM: System Independent Notes On Programming
      ESSAY: Suggestions On How To Learn And Do Programming
      FABLE: A Fairy Tale For Programmers

(end of delman.intro.outline.1)

delman.intro.outline.2


 USE: Uses Of The Delila System
      INTRO: Introduction
      STRUCTURE: Library Structure: Trees, Nested And Named Objects
      LANGUAGE: Delila - The Language
      AUXILIARY.PROGRAMS:  Lister And Search
      DATA.FLOW: Data Flow And Data Loops
      COORDINATES: The Coordinate System Of A PIECE
      CONTROL: How To Control The Responses Of Delila
      COMPARISON: Ways To Compare Sequences
      ALIGNED.BOOKS: How To Make And Use Aligned Books
      PERCEPTRON: Use Of The Pattern Programs
      ENCODE: Use Of The Fabulous And Powerful Encode Program
      DBPULL: Using The Data Base Extraction Programs
      SEARCH: Using The Search Program

 CONSTRUCTION: Constructing Your Own Libraries
      INTRO: Introduction
      STRUCTURE: More  On Library Structure - Logical Vs Physical Structure
      CATAL: Making New Libraries - The Catalogue Program
      EXAMPLE: An Example Of Constructing Delila Libraries
      DATA.ENTRY: Using Your Own Data
      LIBRARY.DESIGN: Making A Delila Data Base
      [FORM...]: The Forms For Library Module Entry

 DESCRIBE: Program And Data Descriptions
      CONVENTIONS: Notation For Naming, Writing And Running Programs
      SHORT.CLUSTER: Short Clustered Descriptions Of Delila System Files
      DOCUMENTATION: How Programs Are Documented
         The format for documentation in the Delila System is in
         file aa.p at the start of the Delman2 manual.

 INDEX
      An Alphabetical Listing Of The Pages In The Manual.
      (See The Page Named DELMAN.INTRO.ORGANIZATION
       For How To Generate The Index.)

(end of delman.intro.outline.2)

delman.intro.description


      WHAT IS THE DELILA SYSTEM?

      The Delila System is a collection of Pascal programs and data originally
 written at the University of Colorado, Boulder that allows one to manipulate
 and study sets of nucleic-acid sequences.  A set of sequences is called a
 library.  There is a librarian, and "her" name is Delila.  One gives Delila a
 list of instructions that name desired fragments.  Delila then searches the
 library, collects all the sequences together and produces a "book".  The book
 may then be searched for patterns, listed with translation to amino acids, or
 studied in various ways using programs other than Delila ("auxiliary"
 programs).  Since books may be small, these analyses can be efficient.

      Books have the same form as libraries.  In other words, libraries have a
 particular structure so that Delila can work with them.  Books have that same
 structure.  For example, given a Master DNA sequence library one can use
 Delila to make a subset such as a transcript library, containing sequences of
 mRNA.  From the transcript library subsets for gene initiation regions can be
 made and these are guaranteed to be sequences from mRNA.  During all these
 manipulations the numbering of the sequences remains consistent so that one
 can refer back to the original library or the literature.  (The technical
 differences between libraries and books will be discussed later.)

      Any auxiliary program that searches a library will know about the
 structure of the library.  Using this structure and the search results, the
 program can write Delila instructions that specify the locations of the found
 objects.  Once again, using Delila, one can loop back and create a book of
 these objects.  Also, the instructions (instead of the sequences) can be
 manipulated by various programs.

      A NOTE FOR PROGRAMMERS
      Each auxiliary program that reads a book or library knows about the
 library structure.  To make programming easy, a set of routines was written as
 an interface between the actual database (kept in a file) and the program
 calls and variables.  These "book reading routines" are kept together in what
 we call a Module Library, containing many chunks of Pascal code.  Each module
 performs certain kinds of tasks.  The modules are transferred from the module
 library into the source code of each auxiliary program by using the Module
 program.  In this way all changes to the interface packages can be made once
 in the Module Library, followed by a series of transfers.  We may send the
 Delila System with modules removed because there is no reason to send
 duplicate code.  After transportation you would assemble the programs.

      We hope that this section gave you a rough overview of what the Delila
 System can do.  Many more details and examples can be found in the sections
 that follow.

(end of delman.intro.description)

delman.intro.references


    libdef - the definition of the Delila Library System (a file)
    moddef - the definition of the Module Transfer System (a file)
    doodle.info - describes Pascal graphics portable under UNIX

Some of the Delila programs and the method of moving modules around
are described in these papers:

    Schneider, T.D., G.D. Stormo, J.S. Haemer and L. Gold. (1982)
    A design for computer nucleic-acid sequence storage, retrieval and
    manipulation.
    Nucleic Acids Research, 10: 3013-3024.

    Schneider, T.D., G.D. Stormo, M.A. Yarus, and L. Gold (1984)
    Delila system tools.
    Nucleic Acids Research, 12: 129-140.

Some related papers are:
    Stormo, G.D., T.D. Schneider and L.M. Gold (1982)
    Characterization of translational initiation sites in E. coli.
    Nucleic Acids Research, 10: 2971-2996.

    Stormo, G.D., T.D. Schneider, L. Gold and A. Ehrenfeucht (1982)
    Use of the 'Perceptron' algorithm to distinguish translational
    initiation sites in E. coli.
    Nucleic Acids Research, 10: 2997-3011.

    Clift, B., D. Haussler, R. McConnell, T. D. Schneider and G. D. Stormo
    (1986)
    Sequence Landscapes.
    Nucleic Acids Research, 14: 141-158.

    Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
    The information content of binding sites on nucleotide sequences.
    J. Mol. Biol. 188: 415-431.

    Stormo, G.D., T.D. Schneider and L. Gold (1986)
    Quantitative analysis of the relationship between nucleotide
    sequence and functional activity
    Nucleic Acids Research, 14: 6661-6679.

    T. D. Schneider (1988)
    Information and entropy of patterns in genetic switches.
    In G. J. Erickson and C. R. Smith,
    editors, Maximum-Entropy and Bayesian Methods in Science
    and Engineering, volume 2, pages 147--154,
    Dordrecht, The Netherlands, Kluwer Academic Publishers.

    T. D. Schneider and G. D. Stormo (1989)
    Excess information at bacteriophage T7 genomic promoters detected
    by a random cloning technique.
    Nucleic Acids Research, 17:659--674.

Reference for Dotmat, Helix, Matrix and Keymat:
    J. V. Maizel, Jr. and R. P. Lenk
    PNAS 78: 7665-7609 (1981)

A reference for Index:
    L. J. Korn, C. L. Queen and M. N. Wegman
    PNAS 74: 4401-4405 (1977)


(end of delman.intro.references)

delman.intro.organization


      ORGANIZATION OF THE MANUAL

      The Delila Manual is broken into several somewhat independent sections.
 When Delman is paged by program PBREAK (see Technical notes below) you will
 find an index at the end.  We anticipate at least two kinds of reader:

 1) The builder who wants to get a Delila System running on a local computer.
 The section on transportation will help you get the data into your computer.
 The section on assembly will guide you through the difficult task of getting
 the programs running.  At that point the Delila Libraries will still not be
 ready to use:  you must construct catalogues as described in the section on
 CONSTRUCTING YOUR OWN LIBRARIES (DELMAN.CONSTRUCTION).  Finally you will be
 able to use the Delila System.  We suggest that you first look over the entire
 manual and associated documents.  Then begin the transport.  Good luck!

 2) The user who wants to use a Delila System that is already running on a
 local computer.  You may be interested in looking over the sections on
 transportation and assembly of the system, but this is not necessary.  If you
 don't know anything about using the computer you should start at
 DELMAN.GUIDE.  In any case, read the section on USE OF THE DELILA SYSTEM
 (DELMAN.USE).

 Each program is described in a separate manual, Delman 2.


 TECHNICAL NOTES (These are not be useful to people just starting.)
      1. The section DELMAN.GUIDE must be rewritten after transportation
 to a new computer system.

      2. DELMAN is physically broken into a set of modules.  Each module
 is a page of the manual.  The individual pages can be extracted (or
 transferred and rearranged) by using the program MODULE, as described
 in the document MODDEF and DESCRIBE.MODULE.  The pages may be looked
 at on-line with the SHOW program (DESCRIBE.SHOW).  The manual or
 extracted modules may be broken into pages for output to a lineprinter by
 using the PBREAK program with a parameter file containing:
    (* begin module
    1
  There is no closing "*)" in the trigger because many different
  module names may follow the trigger, so the trigger is for the common
  part of the module beginnings.

      You can generate another index of the contents of this manual in
 the List file of program Module if you use Delman as the Modlib and a copy
 of Delman as Sin.  (See MODDEF for the definitions of these files.)

(end of delman.intro.organization)

delman.intro.policy


      OBTAINING THE DELILA SYSTEM BY FTP
      The Delila system is available by anonymous ftp in the archive at
 ncifcrf.gov in the directory pub/delila.

      OBTAINING THE DELILA SYSTEM BY TAPE
      We prefer not to have to write tapes or disks, but we will send the
 Delila System by tape as a single package if you do not have have ftp access.
 Under most circumstances we cannot send parts of the system or subsets of the
 data.  Please send us a tape as described in delman.transport.tape.formats,
 and we will write out the entire current version and send it back to you.
 There is no fee.  You may redistribute the system.  If you receive a a copy of
 the system from someone else, you may want to check back with us to see if
 there have been any major changes or corrections.  Referring to the version
 number of the program or documentation will help us know if there were any
 changes.

      DISCLAIMER
      No claim or guarantee is made that Delila System programs and data are
 free of error.  Although we send source code, we cannot guarantee that this
 code will compile and run on all computers.  We believe that our code is
 reasonably efficient, but we cannot be responsible for any costs due to using
 the Delila System.  We do not offer programming support, though we are willing
 to answer questions about the Delila System.

      We would appreciate a detailed description of any program errors (bugs)
 or data errors that you encounter.


      OUR ADDRESS

  Thomas D. Schneider, Ph.D.
  Senior Investigator
  National Institutes of Health
  National Cancer Institute
  Center for Cancer Research
  Gene Regulation and Chromosome Biology Laboratory
  Molecular Information Theory Group
  Frederick, Maryland  21702-1201
  schneidt@mail.nih.gov
  http://schneider.ncifcrf.gov (current link)
  http://alum.mit.edu/www/toms (permanent link)

      ACKNOWLEDGEMENTS

      Jeff Haemer, Mike Aden and Gary Stormo were instrumental in the
original design of the Delila system.

      Many people have helped us by reading and commenting on this
 manual.  We would like to thank:  Ginny Fonte, Larry Gold, Jeff
 Haemer, John Hoffhines, Jane Hessler (VA), Brent Hughes, Billie
 Lemmon, Melissa Mockensturm, Sandy Parkinson (UT), Pat Roche, Herb
 Schneider, Susan Scolman, Sidney Shinedling, Britta Singer, Rosemary
 Sweeney, and Mike Yarus.

      Computer time and resources were generously provided by the
 University of Colorado at Boulder, and the Frederick Biomedical
 Supercomputing Center.

      Funds for this project were provided through grants NIH 1 R01 GM28755,
 NIH 5 R01 GM19963 and ACS NP-178D.

(end of delman.intro.policy)

delman.intro.comments


Please use this page to write comments you have about the manual
and the Delila system.  Our address is on page delman.intro.policy.  Thankyou.

Name:                                    Date:

(end of delman.intro.comments)

delman.transport




















tttttttt  rrrrrrr      aa     n     nn   ssssss
   tt     rr    rr    aaaa    nn    nn  ss    ss
   tt     rr    rr   aa  aa   nnn   nn  ss
   tt     rr    rr  aa    aa  nnnn  nn   ssssss
   tt     rr    rr  aa    aa  nn nn nn        ss
   tt     rrrrrrr   aaaaaaaa  nn  nnnn        ss  --------
   tt     rr  rr    aa    aa  nn   nnn        ss
   tt     rr   rr   aa    aa  nn    nn  ss    ss
   tt     rr    rr  aa    aa  nn    nn   ssssss


ppppppp    oooooo   rrrrrrr   tttttttt
pp    pp  oo    oo  rr    rr     tt
pp    pp  oo    oo  rr    rr     tt
pp    pp  oo    oo  rr    rr     tt
pp    pp  oo    oo  rr    rr     tt
ppppppp   oo    oo  rrrrrrr      tt
pp        oo    oo  rr  rr       tt
pp        oo    oo  rr   rr      tt
pp         oooooo   rr    rr     tt













(end of delman.transport)

delman.transport.requirements


      TRANSPORTATION - WHAT YOU WILL NEED

      If you have obtained the Delila System by computer tape, you will need
 some way of moving the data on the tape into your computer.  We suggest that
 you find someone who has already dealt with tapes.

      All Delila System programs are written in the language Pascal.  There
 are many books available on this language, but the definition of
 the language is in:
      K. Jensen and N. Wirth
      Pascal User Manual and Report
      Springer-Verlag, New York 1978

      Some of the Delila programs have been automatically translated to C.
 See the README file for further details.

      To run Pascal programs you will need a Pascal compiler on your computer,
 and enough memory to use it.  It is impossible to make an accurate estimate
 of the memory requirements, because this depends on the computer system.
 However, we once set up an older version of the entire system on two computers:
      CDC Cyber/KRONOS 5000 pru x 640 char/pru = 3,200,000 characters
      DIGITAL VAX/VMS  7000 blocks x 512 char/block = 3,584,000 characters
 Since then more programs have been added, and we find roughly:
      4,300,000 characters of source code and files
      5,300,000 bytes of compiled code on a Pyramid 90x computer running UNIX.
 Since these estimates include object code, it is possible that the amount
 you require will be more or less.  The estimates do not include memory
 required for running the system.
      Since transportation of programs from one computer to another is
 still a tricky business, we recommend that either you learn about
 tapes, your computer, and Pascal, or that you find local people who
 know about these things and are willing to give you help.

      The first Delila system file on the tape is called AAA (the name
 guarantees that it will be first).  It lists the name of
 all the Delila files on the tape, in the order that they were taped.
 Following AAA the other files are in alphabetical order.
 Files are described in the manual section DELMAN.DESCRIBE.

      If you keep notes on difficulties that you encounter and
 how each was solved, transportation of future versions of the
 Delila System will be easier.

(end of delman.transport.requirements)

delman.transport.tape.formats


      TAPE DATA FORMATS

      We send the Delila System (programs and data) out on tape.
      Send us a standard 2400 foot tape.  We will send you back the tape with
      the format:

         9 track
         1600 bits per inch
         Unlabeled
         Standard ASCII character set
         80 characters per record
         10 records per block

      We can also send UNIX tar tapes.
      The first file on the tape lists the names of all the files on the tape.

(end of delman.transport.tape.formats)

delman.assembly




















    AA      SSSSSS    SSSSSS   EEEEEEEE  M      M  BBBBBBB   LL        YY    YY
   AAAA    SS    SS  SS    SS  EE        MM    MM  BB    BB  LL         YY  YY
  AA  AA   SS        SS        EE        MMM  MMM  BB    BB  LL          YYYY
 AA    AA   SSSSSS    SSSSSS   EEEE      MMMMMMMM  BBBBBBB   LL           YY
 AA    AA        SS        SS  EE        MM MM MM  BB    BB  LL           YY
 AAAAAAAA        SS        SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA        SS        SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA  SS    SS  SS    SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA   SSSSSS    SSSSSS   EEEEEEEE  MM    MM  BBBBBBB   LLLLLLLL     YY
























(end of delman.assembly)

delman.assembly.intro


      ASSEMBLY OF THE DELILA SYSTEM PROGRAMS

      At this point we will assume that all the programs and data are in
 files on your computer.  Be sure to read the sections in PROGRAMS AND
 DATA DESCRIPTIONS (DELMAN.DESCRIBE.CONVENTIONS) that discusses our file
 naming and running conventions.

      This section will guide you in the construction of the Delila System
 programs.  There are several stages to this process:
      changing characters - making sure that all the characters are correct
      removing blanks - blank characters at the end of lines can be removed
         to speed processing and save memory.
      changing words - changing the words that your compiler thinks
         are reserved words in Pascal (but aren't in standard Pascal...)
      module corrections - making sure that modular chunks of code function
         correctly on your computer.
      module transfers - inserting chunks of code into programs
      compilation and debugging - making the programs and finding out why
         things don't work ("If something can go wrong, it will." - Murphy)

      We have written some tools to aid you in this process - but to use the
 tools you must first get some of them running - so the first steps must
 be done by hand.

      Remember to take dated notes about your problems and how they were
 solved.


      USE OF COMMAND FILES
      Most computer systems allow one to put commands in a file and execute
 them.  If you can do this, it will speed up assembly enormously.  One
 such "command" file could contain instructions to remove blanks,
 change characters, change words, transfer modules and perhaps even try
 to compile.  However, it would be better to have several command files,
 each of which did a small part, giving you more flexibility.

(end of delman.assembly.intro)

delman.assembly.chacha


      CHANGING CHARACTERS

      When characters are written to tape they are encoded as binary strings.
 When your computer reads the tape, the characters are decoded for
 storage on your computer.  If the decoding does not exactly reverse
 the encoding, then the characters you receive will not be the same as
 the ones that we send.  For example, you many have a pound sign for each
 exclamation mark that we sent.  Your first task is to find out what
 changes occurred (if any).  To aid you, we provided a list of
 characters with English descriptions in the file 'chars'.
 Look at this file and write down the changes required.
      Use the editor on your computer to correct the characters in the file
 CHACHAS.  Now try to compile CHACHAS.  Determine the reasons for any
 errors.  (For example, you may have to switch double and single quotes
 to satisfy the compiler or you may have to remove the non-standard linelimit
 call.)
      The CHACHA program will now assist you in converting characters in
 the files from the tape.  You should try it out on chars, remembering
 not to destroy the original file.  NOTE: Some Pascal compilers may
 not allow programs that read "nonstandard" characters.  (Example:  small
 characters.)  You may be able to get around this by setting compiler defaults.

(end of delman.assembly.chacha)

delman.assembly.rembla


      REMOVING EXCESS BLANKS FROM FILES

      The files that you get off the tape may have extra blanks (spaces) at
 the ends of lines.  This may be due to transportation itself, or the source
 computer may add extra blanks to lines.  Although these blanks will not
 affect the function of most programs, they will slow down program
 execution and use up extra memory.
      Transportation can also add blank lines to the end of the file.  Some
 programs will object to this.  Catal is one example.
      The program Rembla (remove blanks) will remove all blanks from the ends
 of lines in a file, and any extra blank lines at the end.  We recommend that
 you include this as a step during assembly of programs.  It should
 also be done for data files, especially the libraries.

(end of delman.assembly.rembla)

delman.assembly.worcha



      THE RESERVED WORD PROBLEM

      The language Pascal defines certain words (such as PROGRAM, VAR,
 BEGIN and END) to be reserved words.  These words cannot be used as
 variable names.  This in itself presents no difficulties for
 portability.  However, your Pascal compiler (like ours) may reserve more
 words than just the standard set.  If one of the Delila System programs
 uses a non-standard reserved word of your compiler, then the program will
 not compile.  You will not have to change all these names by hand because
 we have sent a program to do it automatically.
      Non-standard reserved words should be listed somewhere in the manual for
 your Pascal compiler.  Use this list and the program WORCHA to remove all
 the reserved names.  We suggest using new names that are not likely
 to appear in a program.  Example: MODULE could be converted to
 ZMODULE without loss of meaning.  ZMODULE is not likely to be already used in
 a program.
      Worcha will not alter literals or comments, so the program's
 operation will not be affected by this change.  If one makes the
 changes with a standard editor, then the program may not act as
 described in this manual.

      (We hope that those people who design compilers will consider this
 problem in the future.)

(end of delman.assembly.worcha)

delman.assembly.module.1


      ASSEMBLY USING MODULES

      First, familiarize yourself with DELMAN.DESCRIBE.CONVENTIONS.

      You are now ready to assemble a Delila auxiliary program.  The
 raw source LISTERR cannot be compiled as it now stands because
 it is missing a set of replaceable chunks of code (called modules) to read
 books (the book reading interface modules).  These are to be
 found in DELMODS, as stated in the first few lines of LISTERR.  Notice that
 DELMODS is a program - compile and run it.  This will almost certainly
 fail.  Correct those modules that cause problems.  See the section on
 assembly problems.
      Modules can be moved around using the MODULE program.  The details
 of this process are described in MODDEF, which you should study now.


--------------------------- READ MODDEF NOW --------------------------------







(end of delman.assembly.module.1)

delman.assembly.module.2


      Prepare to do the module transfers by compiling MODULES.
      All programs should be tested on small inputs at first.
 Test the Module program with the example module source and library:
      MODULE(EXSIN,EXMODLI,EXSOUT,EXCT,LIST,OUTPUT)
 Exsout should be identical to the sout example in ModDef.
 Examine list and exsout.

 Now try:
      MODULE(LISTERR, DELMODS, LISTERS, DELCAT, OUTPUT)
 The OUTPUT file will tell you the progress MODULE makes during the
 transfer.  Modules in DELMODS will be copied into the right places of LISTERR
 and the result will be LISTERS (LISTER with inserts - source code).
 It will be useful to save DELCAT for further transfers from DELMODS.

 Compile LISTERS.  Run the LISTER (using the default parameters):
      LISTER(EX0BK, EX0LIT)
      The file EX0LIT is a listing of the example book EX0BK.  It should be
 identical to EX0LI.  The possible exception is the begin-page character:
 some computers use a 1 to indicate jump to the next page, while others
 use control-L.

      We would now like to know that LISTER works correctly.  To do
 this requires a comparison program.  MERGE will do.  However, to
 construct MERGE requires modules from PRGMODS.  Compiling PRGMODS and
 running it will test interactive i/o.  The procedures in PRGMODS
 that may need modification are PROMPT, READCHAR and READLINE, in
 decreasing order of system dependence.  You should modify LINELIMIT
 and HALT by transferring the corrected modules from DELMODS into
 PRGMODS.  Prepare PRGMODS and run it.

      Prepare MERGE and use it to prove that EX0LIT = EX0LI.

      You may now construct the rest of the programs.  Note that some
 of them use several module libraries.  For the next stage of setting
 up the Delila System compile CATALS, LOOCATS and DELILAS.  You must
 now construct the libraries: skip to CONSTRUCTING YOUR OWN LIBRARIES,
 (DELMAN.CONSTRUCTION).

      NOTE FOR A SECOND TRANSPORTATION
      If you obtain a later version of the Delila System, then Delmods and
 other module libraries are likely to be altered.  You will want to replace
 modules in the new DELMODS and PRGMODS with your own (system dependent)
 versions.  If you did this directly, you would also replace corrections
 and changes to DELMODS.  To avoid this problem, simply construct a small
 module library (containing for example LINELIMIT, DATETIME modules and
 the interaction modules).  Then use this to change DELMODS and PRGMODS.

(end of delman.assembly.module.2)

delman.assembly.example


      AN EXAMPLE OF CONSTRUCTING A DELILA SYSTEM PROGRAM

      In this example we show the series of steps used to set up a Delila
 system program, given that the module libraries are ready (that is,
 they compile and run).  The example is for Patser, which requires both
 Delmods and Auxmods.  We assume that the tools needed to do this are
 already set up, as discussed on the previous pages.  As noted in
 DELMAN.ASSEMBLY.INTRO, it is frequently possible to automate these steps.

 1. Change Characters
      chacha(patserr,patser1,chachap)
 Chachap must contain the changes you determined earlier.

 2. Remove Blanks
      rembla(patser1,patser2)

 3. Change Words
      worcha(patser2,patser3,worchap)
 Worchap must contain a list of special reserved words and what they
 are to become.

 4. Insert Modules
      module(patser3,auxmods,patser4,auxcat)
      module(patser4,delmods,patsers,delcat)
 Auxcat and delcat will be generated by Module if they were empty.  You
 can reuse them later with their respective module libraries.  The
 module libraries needed are listed in the first few lines of each
 program.  It is not necessary to pickup the DESCRIBE module
 to compile the program.

 5. Compile
 Patsers is now a source code.

(end of delman.assembly.example)

delman.assembly.problems.1



      ASSEMBLY PROBLEMS

      Transportation and assembly problems occur most often because of
 unavoidable system dependent features of particular Pascal compilers.

 INTERACTIVE INPUT
      For interactive input we wrote several modules that work on our computer
 (INTERACT in PRGMODS).  These procedures may or may not be transportable,
 so you may have to modify them.  For example, interactive input on a cyber
 Pascal compiler requires the file name "input/" - you would have to remove the
 "/" for your compiler.  (This is no longer necessary, as the source
 code is now under UNIX which does not require this.)

 DATE AND TIME PROCEDURES
      The module for date and time calls (module PACKAGE.DATETIME in DELMODS)
 must be rewritten.  We strongly recommend that you keep the same form for
 the dates in libraries so that these routines remain interfaces.  Changing
 the form of the date would make transportation of libraries difficult because
 they would not have the same structure in different locations.
      Modules that will work on a VAX computer are in VAXMODS.  You may find
 it easier to adapt these to your computer rather than the ones that
 are in Delmods.
      If your computer does not have a clock, the simplest way to get this
 module running is to add DATE and TIME procedures in the form called
 by READDATETIME.  These dummy procedures could return either a fixed time
 or a random time made by a true random number generator.  The date
 and time is used to uniquely identify books and some data files.

 QUOTES
      CDC Cyber Pascal compilers require double quotes(") where the standard is
 the single quote (').
 SOLUTION: use CHACHA to convert:
         " to '   and   ' to "
 In some cases you will have to use two single quotes so that Pascal prints
 a single quote.  Some programs that print 5' and 3' are Lister, Helix,
 Matrix and Dotmat.  To convert, simply alter the constant called 'prime'.

(end of delman.assembly.problems.1)

delman.assembly.problems.2


 LINELIMIT
       In CDC Cyber Pascal compilers, output to files is limited to 1000
 lines unless the LINELIMIT procedure
 is called.  Your compiler may not require or recognize this silliness.
 SOLUTION: The calls to linelimit are isolated to the procedure
 UNLIMITLN in the module by the same name in DELMODS and PRGMODS.  Simply
 surround the call (inside the modules!!!) with comments.

 INTERNAL FILES (thanks to Sandy Parkinson)
       An "internal file", for the discussion here, is a file used
 by a Pascal program as a scratch pad.  It is not connected to the
 outside world.  Some computer systems and their Pascal compiler
 require that all files be connected to the outside, as they are not
 capable of creating temporary files.  At least two Delila programs
 use internal files: Module and Split.  Correction of this problem
 requires some programming.  It may not be possible to do it for Split.

 COMPARISONS OF PACKED ARRAYS
       May cause you some problems.  One solution is to use arrays
 that are not packed and to write your own comparison procedure.

 THINGS THAT WE HAVE NOT THOUGHT OF...
      Please tell us!  Our address is in DELMAN.INTRO.POLICY.

      For notes on the writing of transportable programs see DELMAN.PROGRAM
      and DELMAN.DESCRIBE.CONVENTIONS.WRITING.

(end of delman.assembly.problems.2)

delman.guide




















  GGGGGG   UU    UU  IIIIIIII  DDDDDDD   EEEEEEEE
 GG    GG  UU    UU     II     DD    DD  EE
 GG        UU    UU     II     DD    DD  EE
 GG        UU    UU     II     DD    DD  EEEE
 GG        UU    UU     II     DD    DD  EE
 GG  GGGG  UU    UU     II     DD    DD  EE
 GG    GG  UU    UU     II     DD    DD  EE
 GG    GG  UU    UU     II     DD    DD  EE
  GGGGGG    UUUUUU   IIIIIIII  DDDDDDD   EEEEEEEE
























(end of delman.guide)

delman.guide.intro


      HELLO COMPUTER - A GUIDE TO THE NEW USER

      ABOUT THIS SECTION:  This section is a guide to using the computer.
 Whenever you have questions about the computer, this is the place to
 look, because the rest of the manual is about the Delila System ONLY.
 That is to say, we have split this manual into several parts - and it will
 not help for you to look for the right thing in the wrong part.  The
 reason for this is that the information about the Delila System can be
 moved from one computer to another (just like the Delila System) but
 information about computers usually cannot be moved.  DELMAN.GUIDE must be
 REWRITTEN for other computers and operating systems.


      ABOUT THIS COMPUTER:  This manual section is written specifically for
 UNIX operating systems.  (UNIX is a trademark of Bell Laboratories.)


      OTHER DOCUMENTS AND RESOURCES:
      In general, ask around.

      Type
          help
      to get pointers.

      Learn how to use the UNIX manual program (man).

      The apropos program is useful for finding things.

      There are hundreds of books on UNIX.  Find one you like.  Many
      people seem to like:
         UNIX for People by P. Birns, P. Brown and J. C. C. Muster
         Prentice-Hall, Inc, 1985

      The easiest way to learn to use a computer is to use the computer!
      Obtain a login identification and plunge in.

      DO NOT REVEAL YOUR PASSWORD TO ANYONE!!!

(end of delman.guide.intro)

delman.guide.advice


      SOME ADVICE TO A NEW COMPUTER USER:

 1) YOU CAN'T HURT THE COMPUTER.  Don't hesitate to try things and
 to play around!

 2) After you learn how to get on and off the computer your best bet is to
 get a firm grip on what files are, how you can make them and how to
 manipulate them.  The easiest way to understand what is happening is to watch
 it happen.  You should use the commands that display your files after each
 file manipulation - until you have a good feeling about what is happening.
 If you do this you will quickly become confident about what you are doing.

 3) A lot of the general principles that you pick up will be similar
 on other computers.

 4) Be wary of the characters you type.  Notice that a zero (0) is NOT
 the same as the capital letter O - the computer can tell them apart.
 This is also true for a one (1) and the small l.

 5) Do not do any serious work while you learn to use the computer.  You
 are likely to destroy some of your files.  That will hurt you and not
 the computer.  Loss of good data can be terribly frustrating.

 6) If you have a problem TRY A SIMPLER CASE,
                          TRY TO ISOLATE THE PROBLEM.

 7) An experienced advisor is worth a thousand hours of computer time.



      UNCRITICAL ACCEPTANCE OF COMPUTER RESULTS

   "So useful has the computer become in all branches of statistical analysis
 that there may be some tendency to forget that even it has its limitations.
 The computer cannot work magic--not yet anyway.  It will do only what it is
 instructed to do, and the validity of the results is determined by the
 accuracy and adequacy of the data put in and the wisdom of the people
 writing the instructions.  Granted, the computer can perform a great
 many calculations much more rapidly than mere mortals can do them.
 Nevertheless, speed of computational work is not the same thing as
 infallibility in aiding with the decision-making process.  A statistical
 critic, of all people, should guard against being overawed by the news
 that certain information was turned out by a computer.  The mere fact
 that computers are being used these days even to cast horoscopes should
 be ample proof that a computer is no more immune to spewing out
 nonsense than are real flesh-and-blood people."
      -from FLAWS AND FALLACIES IN STATISTICAL THINKING
         by Stephen K. Campbell (N.J. Prentice-Hall Inc., 1974), p. 182

(end of delman.guide.advice)

delman.guide.delila


      HOW TO USE THE DELILA SYSTEM ON THIS COMPUTER

 Computer:  Cutterjohn and Sparky.

 The Delila System programs and documentation are kept in the directory
     ~toms/delila
 The binary forms (which you can run) are in
     ~toms/bin
 If you put this directory in your path, then they will simply be commands.

(end of delman.guide.delila)

delman.program


























    PPPPPPP   RRRRRRR    OOOOOO    GGGGGG   RRRRRRR      AA     M      M
    PP    PP  RR    RR  OO    OO  GG    GG  RR    RR    AAAA    MM    MM
    PP    PP  RR    RR  OO    OO  GG        RR    RR   AA  AA   MMM  MMM
    PP    PP  RR    RR  OO    OO  GG        RR    RR  AA    AA  MMMMMMMM
    PP    PP  RR    RR  OO    OO  GG        RR    RR  AA    AA  MM MM MM
    PPPPPPP   RRRRRRR   OO    OO  GG  GGGG  RRRRRRR   AAAAAAAA  MM    MM
    PP        RR  RR    OO    OO  GG    GG  RR  RR    AA    AA  MM    MM
    PP        RR   RR   OO    OO  GG    GG  RR   RR   AA    AA  MM    MM
    PP        RR    RR   OOOOOO    GGGGGG   RR    RR  AA    AA  MM    MM


















(end of delman.program)

delman.program.essay


      SUGGESTIONS ON HOW TO LEARN AND DO PROGRAMMING
      (An Essay By Tom Schneider)

 ABOUT LANGUAGES
      A computer language is the meeting ground between the absolutely
 rigid requirements of a computer (it must be told exactly what to
 do) and the ambiguous and flexible uses of human languages
 (such as "go jump in a lake", "pour me a cup" etc).

      Recently many academic institutions in the USA have allowed students
 to substitute computer languages for a knowledge of human languages.
 Although a knowledge of computers is becoming increasingly important
 in our society, this change is short sighted: no computer
 language is anywhere near as powerful or beautiful as those
 practiced by humans.  With dedication one can easily learn twenty
 computer "languages" in a few years, whereas the polyglot is rare
 indeed.  It is important to learn both kinds of language.  For one to
 substitute FORTRAN for French is preposterous cheating.

 HOW DO LANGUAGES WORK? COMPILERS
      Every kind of computer has its own internal "machine" language.
 It is difficult for a person to write or read this because it
 consists of long stretches of ones and zero's: 0100101010111010000011
 10110111101001110010100101001010...   Every "bit" (a one or a zero) must be
 exactly right or the machine will not operate correctly.  Most
 people can't deal with such immense amounts of detail.  The solution
 is to force the computer to keep track of the details and let the person
 think in word-like and sentence-like units:
      IF SUNNY THEN REJOICE
               ELSE MOPE;
 Once one has written a set of sentences in a "higher" level language,
 one must have the computer convert them to its own internal machine
 language (this is not strictly true, but we will only discuss one
 method here).  The process is called compiling.  A self-contained and
 consistent set of "sentences" and "paragraphs" is called a program.
 Obviously one also needs a program to do the compiling - that program
 is called a compiler.
      For example, one relatively modern language is called Pascal.  A
 Pascal compiler sits ("resides") in ("on" - so much for jargon)
 a particular computer.  It converts statements made in the Pascal
 language into machine zero's and one's for that computer (and only
 that computer).  In other words, it converts a SOURCE code into an
 OBJECT code.  The object code can be made to operate ("run") only
 on one kind of computer.  (Note: the word "code" means "program".  Also,
 on some computers one must convert the object code into "executable"
 code before it can be run.)
      (Here is something to puzzle over.  It is now common practice to write
 a compiler in the same language that the compiler compiles.  The
 Pascal compiler was written in Pascal.  It's like pulling oneself
 out of the mud by the bootstraps... how did it start?)

 WHY PASCAL?
      One of the first languages written was called FORTRAN.  In its day
 (the 1950's) it was a great boon because one no longer needed to write
 in machine language (or even one step up, assembly).  Since that time
 many new ideas have been incorporated into languages.  Some of them
 (such as recursion and complex data types) fall outside the range that
 FORTRAN can handle.  This evolution is to be expected.  Yet people
 still try to teach an old dog, so there have been a series of
 "improvements" to FORTRAN.  The result is a great mish-mash of
 dialects.  For these reasons (and other things like the dread
 FORMAT statement) it is difficult (although not impossible) to write good
 transportable code in FORTRAN.  ("Transportable" or "machine independent"
 means that the program will work on several different computers.)

      Pascal is a more modern language, so it includes recently developed
 concepts.  One can write excellent crystal clear code in this language.
 Unfortunately this property does not prevent one from writing poor and obscure
 code!


 TOPDOWNING: How To Write Clear Code
      There are as many ways to write code as there are people.  Yet a
 few simple principles allow one to organize one's thoughts quickly
 and efficiently.
      Writing a program is just like ... writing an outline.
 One starts at the "top" by writing the main things to be done:
   Tom's Day
      I.   Morning
      II.  Travel To Work
      III. Work
      IV.  Travel Back Home
      V.   Evening

 Then one writes the first section:
      I. Morning
         A. Get Up
         B. Shower
         C. Get Dressed
         D. Eat
         E. Put On Coat
 This is repeated for the other sections.  Eventually we get even deeper:
      I. Morning
          A. Get Up
            1. Huh?
            2. Open eyes
            3. Yawn
               ...

 In Pascal, one dispenses with the numbering of sections.  Instead,
 each section has a name.  A section is called a procedure.  Since you
 can read all about procedures, I won't go into more detail here.
      The main advantage to this method is that if one is careful, each
 procedure is isolated from all the others.  There is only one thing to
 think about at a time.

 SPAGHETTI PROGRAMMING
      Many computer languages, including Pascal, allow one to jump from one
 statement to others in the program.  These GOTO statements invariably
 lead to poor programs because one creates nests of GOTO's that jump
 all over the place.  These can be difficult to figure out.  I
 have seen a case where a professional programmer didn't know about an
 inefficient series of jumps that he had written.  Even large companies
 sell code that is a tangled mess.  Modern programmers have found that
 the solution is amazingly simple:
      DON'T USE GOTO'S
 The Delila system programs use only one GOTO, in a procedure named HALT
 which terminates the program by jumping to the end of the program. This
 is necessary because Pascal does not provide for a program abort procedure.
 (Pascal HALT is not standard.)  There are NO other circumstances when a
 GOTO is required!!

 A METHOD FOR WRITING PROGRAMS

      This is what I do when I write a program:  I have a stack of old
 computer paper (or standard size paper, not printer size).  I write
 one procedure on each sheet.  An entire procedure is "no longer than"
 one page.  In fact, any procedure longer than a page is usually
 a warning that I need more procedures.  It is not necessary at first
 to write the details of every procedure, only to define the
 procedures.  Starting from the top I work down a ways, realize that I
 need a set of primitive procedures (eg. to manipulate text lines)
 so I define them, but the way they work can be written later.  So
 as the highest levels of the program are formed, the lower levels
 are defined.  Eventually it is time to write details of the lower
 levels.  Sometimes the higher level can be simplified as the lower
 levels become clearer.
      As you can tell from this description, one begins from the top, but
 the entire structure changes as one goes.  Don't be afraid to toss
 out a procedure that's no good - it's only one page and the paper
 can be recycled.

      The last point is important:  be flexible.  Don't keep banging your
 head against a logical dilemma.  I have often outlined a whole
 program - and then tossed it out because there was a
 better solution.  Learn when to drop.  Clues: you find yourself
 trying to do many things at once; the primitive procedures that
 you have devised are awkward to use; and you find it impossible to
 document a procedure.

      Document a procedure??


 DOCUMENTATION: The Key To Immortal Code

      Even in a high level language like Pascal, it is possible to have a
 functioning program that is not easy to understand.  To define a procedure
 I often write down the name of the procedure, the variables (pieces of
 information to be manipulated) that it uses and then a few English sentences
 that define exactly how the variables are to be used.  This is all one needs
 for the higher levels of the outline.  Those written sentences are called
 comments.  They are part of the documentation required to make the program
 easy to write and ... easy to read.

      It is impossible to overemphasize the importance of documentation
 because nobody EVER does enough (me included).
      If you don't document, within a short time (e.g. a month to half
 a year) you will have forgotten the details of the program - and it will be
 painful to figure it out again.  Worse than that - nobody else will be
 able to work with it!
      It is not hard to write out what you are trying to do in a particular
 section of code or procedure, and it has a real advantage: one is
 forced to think clearly.
      There are several places in a program that ought to have comments:

 PROGRAM STATEMENT - the program should state its purpose in life, how it
 should be used, who wrote it and the date of the latest version.  Some
 technical details can be included.

 CONSTANTS - Include a constant called VERSION and CHANGE THIS EVERY TIME
 THAT YOU CHANGE THE SOURCE CODE.   Write the version to all output from the
 program.  This will assure that all output can be unambiguously
 associated with a particular version of the program.  This will save you
 many headaches! (Note: some computers keep track of file versions.
 FILE VERSIONS WILL NOT SUBSTITUTE FOR AN INTERNAL CONSTANT because
 the program output is not affected and it is not transportable.)

 All CONSTANTS, TYPES and VARIABLES should have a short description of
 their purpose.  DON'T USE ONE VARIABLE FOR TWO PURPOSES - you will
 be unable to document these cases properly and the code will be
 confusing.

 Each PROCEDURE or FUNCTION should have a short description that
 tells how to use it and gives the purpose of each passed variable.


 *****************************************************************************
 *    SUMMARY: programming is vastly simplified by using two simple tactics: *
 *             topdowning and documentation.                                 *
 *****************************************************************************


      A NOTE ON DATA STRUCTURES
      Higher level languages, such as Pascal (but not FORTRAN) allow one to
 describe data in forms (structures) that resemble the way one thinks
 about the problem.  To take advantage of these facilities, it pays to
 name each "variable" (a structured box into which data is put) and "type"
 (the structure of the box) carefully.  A good name will make
 operations on the variable obvious, and errors will stand out because
 they will "sound" wrong.


 LOCATING ERRORS: Debugging
      Even with top down programming and documentation, errors are made.
 These are called "bugs".  There are several kinds:
      SYNTAX - the compiler will yell at you for things like spelling mistakes
      BOMBING - the program stops abruptly when it should not
      LOGIC - the program produces strange results
      SUBTLE - the program can't handle certain rare conditions correctly

 SYNTAX - It helps to check what you type in.  Since I put one procedure
 per hand written page, this is the easiest unit to check.  Many subtle
 bugs can also be caught this way.

 BOMBING - It is often obvious where the program died.  Work backwards through
 the logic to find the error.  Clear, top-down code makes this much easier:
 one can often tell immediately where the problem is.  Tracing also can
 help.  See below.

 LOGIC and SUBTLE - Some computer systems allow one to trace the path that
 the computer follows through a program.  So far I have not found these
 useful because they are cumbersome and they put out too much data.
 A few well placed write statements will trace the program flow quite well.
 (A "write statement" could print the value of a variable out for you and
 tell you where the computer currently is in the program.)
 In Pascal, one method is to make a global constant:
      DEBUGGING = TRUE; (* FOR DEBUGGING PURPOSES *)
 and use it this way:
      IF DEBUGGING THEN WRITELN(OUTPUT, "BEGIN PROCEDURE CIRCLE");
 By changing the value of DEBUGGING one can turn the trace on and off.
 To turn off an individual trace point, one can "comment it out":
      (* IF DEBUGGING THEN WRITELN(OUTPUT, "BEGIN PROCEDURE CIRCLE"); *)
 The symbols "(*" and "*)" will make Pascal ignore the contents,
 because they become comments.  The advantage of this over removing the
 statement is that it allows one to reactivate it easily.

 By far, the most time saving method is to write clear, well documented code.


 TESTING CODE
      It is often worthwhile to test a program on a small set of examples that
 one has worked out by hand.  You should be aware however, that correct
 answers to tests do not prove that the program is correct.  (This may
 seem obvious, but it is an easy mistake to make.)  Sometimes one can
 prove the correctness of a program.  This is a current field of research
 in computer science.


 HOW TO READ MANUALS
      Obtain your own copy of the manual and begin to read.  Get a general idea
 of how the language, editor or system works.  Don't worry about details
 yet.  As soon as you have an idea about how to do something, try it on
 the computer.  Play.  Later on, you can read through the manual seriously
 if you want.  However there is often a lot of detail that you would have
 to memorize.  It is simpler to know that something can be done (by reading
 it once lightly) and to look it up when you need to do it.


 WRITING TRANSPORTABLE PROGRAMS

      A program written for one computer may not run on another computer
 because the compilers for the two computers may not understand the
 same language.  Moving a program from one computer to another is called
 transportation.  If you are going to the trouble and effort to write a
 good program, then you may as well make it easy for other people to use
 it.  Your program would then be transportable.
      Obviously to be transportable, a program must be well written and
 documented.  That is not all.  You must avoid all the fancy "features"
 that your compiler advertises, because no one else has these.  If you
 are forced to use some feature, then isolate it to a few replaceable
 procedures.  We have provided you with a transportable(!) mechanism for
 replacing chunks of code like this - see the document MODDEF and the MODULE
 program.

 PROGRAM MAINTENANCE... SENILITY... AND DEATH.
      The most costly aspect of using computer programs is not their initial
 writing, but maintaining them once they are written.  This is well
 documented in the literature.  But why should a program need
 maintenance?  Aren't they fixed text that does not change?  In the
 simplest sense this is true.  But over time, bugs in the code are found
 and fixed, and needs and expectations change.  Programs are not
 static, they evolve.  Good programming techniques and documentation
 make maintenance easier during the life time of a program, but eventually the
 program becomes so hard to change that one must scrap it altogether
 and start a fresh design.  So programs have a birth, a life of use and
 maintenance and, finally, a senility before they die.

 REFERENCES
      "Pascal User Manual and Report", Second Edition, by Kathleen Jensen
      and Niklaus Wirth.  Springer-Verlag, 1978.

      "Software Tools in Pascal", Brian W. Kernighan and P. J. Plauger.
      Addison-Wesley Publishing Co. 1981.

      "Algorithms + Data Structures = Programs", Niklaus Wirth.
      Prentice-Hall, Inc., 1976.

      "Structured Programming", O. J. Dahl, E. W. Dijkstra and C.A.R. Hoare,
      Academic Press. London, 1977.

      "Selected Writings on Computing: A Personal Perspective",
      E. W. Dijkstra, Springer -Verlag, New York, 1982.

(end of delman.program.essay)

delman.program.fable



      A Fairy Tale For Programmers


      The Three Most Important Concepts
         for Writing Good Code

 1. Put comments in your code.

 2. Don't ever forget that six months from now your program
    will be useless even to you without comments.

 3. Several people who published a rather well known article on
    using computers to study sequences (and whose names shall remain unsaid
    to protect the guilty) sent their programs to us two years after they
    had published their article.  It turned out that we could not use
    their programs directly because we did not have available the language
    that they used.  It was necessary to translate each line of code into
    our language before we could use their program.  Ok, fine, we know how to
    do that.  But despite the fact that these were old programs that they had
    been working on for a long time, there were almost no comments in
    their code.  That made the translation 100 times more difficult!!
    One sees an equation in the code - what does it mean?  If they do
    something in a funny way, was it a mistake or is it important to
    do it that way?  What a headache!
    We threw out their programs and wrote our own.


          MORAL: Code that is not documented in English will
                 not survive in the long run.  Therefore:
          Put In Comments.
          Comment As You Code, NOT AFTERWARDS - Comments Are Part Of The Code.
          Change The Comments When You Change The Code, NEVER PUT THIS OFF.




      Epilogue
      Years later, out of curiosity, the program called CODE
 (COmment DEnsity) was written.  We were startled to discover that
 the frequency of characters devoted to comments in our code
 averages around 30 percent!


(end of delman.program.fable)

delman.use




















 UU    UU   SSSSSS   EEEEEEEE
 UU    UU  SS    SS  EE
 UU    UU  SS        EE
 UU    UU   SSSSSS   EEEE
 UU    UU        SS  EE
 UU    UU        SS  EE
 UU    UU        SS  EE
 UU    UU  SS    SS  EE
  UUUUUU    SSSSSS   EEEEEEEE
























(end of delman.use)

delman.use.intro


         Use Of The Delila System

      INTRODUCTION
      This section of the Delila Manual assumes that you have read the
 introduction to the manual, that a Delila System is running on your
 computer, and that you know how to get on the computer, to make
 files, to modify and correct files, and to run programs (See DELMAN.GUIDE.).

      There are several sources of information that you can keep in mind:
 1) The papers in DELMAN.INTRO.REFERENCES will show you
 how we have used the Delila System.
 2) LIBDEF.  This is a technical specification of Delila and the
 libraries.  However, there is a set of detailed examples that
 can be read profitably without reading all the definitions.
 3) The section of DELMAN called Program and Data Descriptions
 (DELMAN.DESCRIBE) lists everything that is available to you.  Whenever
 you want a tool to do something, that is the place to look.

      In this section we will first discuss the structure of a Delila Library
 and how you can find your pet (pet's?) sequence in it.  Next we
 describe how to tell Delila to go and fetch your sequences.  We will
 then discuss programs that let you study the sequences.  The sequence
 analysis will bring us back to Delila.

(end of delman.use.intro)

delman.use.structure.1


      LIBRARY STRUCTURE

      Think about a tree.  The trunk spreads into a series of branches,
 sticks and twigs.  A Delila library looks something like that, except
 that there are several kinds of branch, stick and twig, much as each
 twig ends in a leaf, bud or a flower.
      We have given names to the kinds of branches and leaves in Delila
 libraries.  Near the trunk there are the ORGANISM and the
 RECOGNITION-CLASS.  An ORGANISM is a cluster of data pertaining to a
 real-world organism.  The term "organism" is somewhat ambiguous, so it
 is a matter of taste as to the classification of some creatures (is a
 virus a traveling plasmid?).  In our library T4, T7 and E. coli
 information is stored in ORGANISMs.
      A RECOGNITION-CLASS is a cluster of data about any process that
 recognizes specific nucleic-acid sequences.  These include chemical
 modification and restriction enzymes.  (At present this portion of
 the library is not fully implemented, so we will not discuss it further.)

      The library structure can be diagrammed in a schema:
         A-->>--B  means A has one or more of B.
         C--->--D  means C has one of D.

                             LIBRARY
                              :   :
                              V   V
                              V   V
                              :   :
                  ............:   :.............
                  :                            :
              ORGANISM                RECOGNITION-CLASS
                  :                            :
                  V                            V
                  V                            V
                  :                            :
              CHROMOSOME                       :
               : : : :                         :
               V V V V                         :
               V V V V                         :
               : : : :                         :
   ............: : : :.........                :
   :       ......: :....      :                :
   :       :           :      :                :
  MARKER  TRANSCRIPT  GENE   PIECE....       ENZYME
   : :     :           :     : : :   :         :
   V V     V           V     : : :   V         V
   : :     :           :.....: : :   :         :
   : :     :...................: :   :         :
   : :...........................:   :         :
   :                                 :         :
  SEQUENCE                       SEQUENCE  SEQUENCE

(end of delman.use.structure.1)

delman.use.structure.2


      In this schema you can see that ORGANISMs have one or more
 CHROMOSOME branches.  Once again, the term CHROMOSOME is intended to
 be somewhat flexible.  In Delila it means a complete biological
 unit of nucleic-acid either DNA or RNA.  For example, we refer to both the
 ECOLI (the 5 million base one) and the CHROMOSOME PBR322 (the 4.3kb plasmid).
      Notice that real-world chromosomes are "inside" their organism.  In the
 same way, one can think of CHROMOSOMEs to be inside their ORGANISM and
 ORGANISMs to be inside a library.  You may think of a Delila Library
 either as a tree or a series of objects, one nested inside the other.
 A little reflection will show that these are equivalent because one
 can convert from one form to the other.
      Every ORGANISM and CHROMOSOME has a name by which it can be identified.
 For example, T4 is the name of the coliphage of rII fame, while ECOLI
 is the name for Escherichia coli.  There is other information stored
 at these branch points as well.  An ORGANISM tells us the genetic map units
 used, such as centiMorgan or kilobasepair.  The CHROMOSOME goes on to
 specify the beginning and ending of the corresponding chromosome in
 the given units.
      Now we will delve inside a CHROMOSOME.  There are MARKERs,
 TRANSCRIPTs, GENEs and PIECEs.  What is going on?  So far we have
 been leaning toward a description of an ideal situation where all
 the nucleic-acid sequence information of a chromosome would be stored inside
 a single data object -- a PIECE.  Although this fits small phages such as
 PHIX174 and FD, it is nowhere near true even for ECOLI.  There are many dis-
 connected fragments of E. coli sequence now known.  As sequencing progresses,
 the fragments will connect more and more until the entire sequence is known.
 So a PIECE may be either the entire sequence information in a CHROMOSOME
 or only one of many fragments.  In this way we can store sequences
 in their natural arrangement, and still accommodate data that is
 fragmented due to technical limitations.  As more sequence is obtained,
 the SEQUENCE inside a PIECE is extended or fused to neighboring PIECEs.
      Like all the other library objects, a PIECE has a name, usually related
 to its biological functions.  To keep all the fragments straight, each
 PIECE tells its location on the genetic map.  The nucleic-acid
 sequence is stored inside a SEQUENCE, written 5' to 3'.  Besides these
 data, each PIECE stores a useful set of information: a
 coordinate system.
      For the purposes of identification, every published sequence is given
 a set of consecutive integers corresponding to basepairs or bases
 along the DNA or RNA sequence.  This numbering scheme is captured
 in the coordinates of each PIECE.  Using Delila, subfragments of a
 PIECE can be easily obtained.  These are also PIECES and every base
 in the new PIECE has the same number that its parent did.  This has
 WONDERFUL consequences:  every printout can refer to the original
 published literature.  It is also easy to compare the results from
 several analyses.

(end of delman.use.structure.2)

delman.use.structure.3


      Let's move on to the GENE, one of the other data-objects inside a
 CHROMOSOME.  A GENE defines the endpoints of the genetic information
 of a protein in the SEQUENCE of a PIECE.  For example, in ORGANISM ECOLI;
 CHROMOSOME ECOLI there is a PIECE LAC.  The GENE LACI refers to this
 PIECE by pointing to the first G of the GTG and the A of the TGA.
      A TRANSCRIPT is similar to a GENE, but it defines any region
 transcribed into mRNA.  For consistency, we consider a tRNA to be a
 TRANSCRIPT and not a GENE.  GENE is reserved for the coding sequence
 of polypeptide products.
      Suppose that a mutation is known for your favorite sequence.  The
 MARKER is designed to record the change made by the mutation.
 MARKERs can also record splice junctions and other interesting
 sequence features.  In the future Delila will allow one to obtain
 both a sequence and its mutated forms using MARKERs.

      Notice that MARKERs, TRANSCRIPTs and GENEs all refer or point to
 a particular PIECE.  Each PIECE therefore has a "family" of related
 branches.  It is here that the tree-like structure of the library
 begins to break down: some of the branches are connected to one
 another in a kind of network.

      Now it is time to become practical.  Obtain a copy of HUMCAT.  This
 is a catalogue of the library, the HUMan's CATalogue.  (Delila also
 has one for herself).  Look around HUMCAT.  Notice that it is
 organized by ORGANISM, CHROMOSOME, and so forth.  Find a GENE or
 TRANSCRIPT that you are interested in.  In the next section you
 will learn how to obtain it to play with.

(end of delman.use.structure.3)

delman.use.language.1


      DELILA - THE LANGUAGE

      WHY WRITTEN INSTRUCTIONS?
      One of our major design decisions was the use of written instructions
 for the librarian.  While we realize that this is somewhat foreboding
 to a new user, it does have several advantages over direct interactive
 use.  One is that it is easier to correct mistakes in the list of
 sequences that are to go into the book than it is to change sequences by
 hand.  Corrections to instructions are done with a text editor.  Also, the
 amount of information necessary to obtain a fragment of sequence is usually
 less than the information in the sequence itself, so storing instructions
 instead of sequences is efficient.  Another advantage is that a complete
 and concise record may be kept.  As we will see later, the instructions can
 also be generated by auxiliary programs, allowing one to automate many
 complex manipulations.

      WHAT IS THE DELILA LANGUAGE?
      This section describes the use of the language Delila:
         DEoxyribonucleic-acid
           LIbrary
             LAnguage.
      The language is not as complex or comprehensive as a natural language
 such as English or French.  It was designed for a particular task:
 telling a nucleic-acid data base manager - the librarian - the set of
 fragments that one wants to collect for study.  (The name Delila is an
 anachronism that we can't bear to part with...)
      Since the library is structured like a tree, the language must allow
 one to specify individual branches.  Eventually a particular PIECE
 will be identified, and one can request one or more fragments from
 the PIECE.  Let us look at an example:

      TITLE "EX1: THE LACI GENE";
      ORGANISM ECOLI;
         CHROMOSOME ECOLI;
            GENE LACI;
            GET ALL GENE;

 (Note: this instruction set is kept in the file EX1IN, so you can
 try it.  All EXn examples are sent with the Delila System.)

      Statements in Delila end with a semicolon (;) - there are five
 statements above.  The first statement will give a title to the book.
 The next three specify a particular GENE in the library structure.
 One thinks of this as a series of steps climbing the library tree.
 Starting at the "root" of the library, we first named the ORGANISM
 ECOLI.  This moves us out to that ORGANISM.  Then the CHROMOSOME
 was chosen to be ECOLI - the main chromosome (as opposed to a
 plasmid such as PBR322).  Next, the particular gene, lacI, is
 specified by "GENE LACI;".
      As we noted in the section on structure, GENES point to the
 particular PIECE that they reside on.  GENE LACI points to the PIECE LAC.
 Although we need not know this for the request, Delila knows it
 automatically.  When the GET is performed, Delila will obtain the
 sequence of lacI from the G of the GTG through the A of the TGA.
      After Delila has read each of these statements, the information
 about the object (ORGANISM, CHROMOSOME or GENE) is put into the
 book.  The GET generates a PIECE that is also placed into the book.

(end of delman.use.language.1)

delman.use.language.2


      TRY IT OUT
      Type a file containing Delila instructions that specify the gene
 you chose at the end of the section on library structure.  For this
 discussion, we will use the name EX1IN, although you may use another
 name.  Find the entry on Delila (DESCRIBE.DELILA) in the back of this
 manual and run it:
      delila(ex1in,ex1bo,ex1dl)
 Look at the ex1dl file. This is the Delila Listing.  The first
 line will look like this:
   82/01/21 23:17:51     DELILA 1.20     PASS 1           PAGE 1
 Delila performs two passes through the instructions.  Pass 1 checks for
 spelling and syntax errors.  If you made a typing mistake, it will be noted
 in the listing and Delila will not begin Pass 2.  Should Pass 1 be
 successful, then Pass 2 begins.  Notice that there are several lines that look
 something like this:
 * 81/01/18 22:29:26, 80/11/19 22:17:46, LIBRARY 1: BACTERIOPHAGE
 * 81/01/18 22:29:26, 80/11/19 22:17:46, LIBRARY 2: E. COLI AND S. TYPHIMURIUM
 These are the full titles of the libraries from which you are pulling
 sequences.  Each title has three parts separated by commas:
      1) the instant (date and time in descending order) that the library
         was created.
      2) the instant that the PARENT of this library was created.
      3) the title of the library.
 Notice that Delila also prints the current date and time at the top
 of the listing (if your system has these functions).  The first line of a
 book or library contains its full title.  For this example, this is:
 * 82/01/21 23:17:51, 81/01/18 22:29:26, EX1: THE LACI GENE
 What is the "genealogy" of the book that you obtained?
      Back to the listing, Pass 1.  The instructions that you typed are
 repeated on the listing.  To the left are two columns of numbers -
 the leftmost is the line number and the next is the statement number
 (there can be several statements on one line or one line may contain
 only part of a statement).  This information is sometimes useful.
      Now let's look at the listing, Pass 2.  Notice that the instructions
 that you typed are repeated again, but that there are extra lines
 inserted.  In Pass 1 Delila checked for typing errors, while in Pass
 2 Delila pulls out data items and places them into the book.  As
 each item is put into the book, it is given a number:
      2     2        ORGANISM ECOLI;
                                  #1
 This is useful for some auxiliary programs.  We will discuss control of
 the numbering in a later section.
      If your instructions worked then there will be two other numbers just
 below the get:
      5     5              GET ALL GENE;
                             #4
                                      ^29^1111
 These numbers show you the numbers of the beginning base (29) and
 the ending base (1111) for the PIECE put into the book.


(end of delman.use.language.2)

delman.use.language.3


         RANGE DEFAULTS
      It is quite possible that you got an error message at this point:

      4     4           GENE LACZ;
      5     5              GET ALL GENE;
                             #4
                                      ^1234^100000
 ---ERROR(S)---------------------------^206^203
 203: OUT OF RANGE AND DEFAULT RANGE = HALT
 206: WE DO NOT KNOW THIS LIMIT (A WARNING)

 This indicates that only part of the gene you are interested in
 exists in the library.  Delila detects the fact that one end of
 the GENE goes off the end of its PIECE, and says that this limit (the
 end of the gene) is unknown.  (This is indicated by the 100000.)  Normally
 Delila will HALT when this situation is discovered.  You can change this by
 using the instruction:
      DEFAULT OUT-OF-RANGE REDUCE-RANGE;
 anywhere before the problem but after the TITLE.  This resets the default
 response to an out of range situation.
      In REDUCE-RANGE mode, Delila will attempt to find the closest edge
 of the PIECE and use that.  The listing will show a record of what
 Delila does:
      6     6              GET ALL GENE;
                             #4
                                      ^1234^100000^1419
 ---ERROR(S)---------------------------^206^208

 206: WE DO NOT KNOW THIS LIMIT (A WARNING)
 208: OUT OF RANGE AND DEFAULT RANGE = REDUCE (A WARNING)
 In this case the PIECE in the book begins at 1234 and ends at 1419.
      To cause Delila to continue without putting any PIECE down in the book
 one would use:
      DEFAULT OUT-OF-RANGE CONTINUE;
 You may use several default statements to affect how Delila responds.
 To reset the default to halting, use HALT instead of CONTINUE or
 REDUCE-RANGE.  (See DELMAN.USE.CONTROL)


      Use the programs COUNT and LISTER to look at your book.

(end of delman.use.language.3)

delman.use.language.4


      MORE ON INSTRUCTIONS
      There are several ways to obtain sequences in a book.  For example
 one could use:

      TITLE "EX2: AN ABSOLUTE GET";
      (* FIRST WE WILL SPECIFY THE LAC PIECE: *)
      ORGANISM ECOLI; CHROMOSOME ECOLI; PIECE LAC;
      (* NEXT WE WILL REQUEST A PARTICULAR FRAGMENT OF THAT PIECE: *)
      GET
         FROM 29  (* THE BEGINNING ABSOLUTE POSITION *)
         TO 1111; (* THE ENDING ABSOLUTE POSITION *)

 There are several things to note about these instructions.  First, there
 are 5 instructions and four comments.  A comment is the text between
 a (* and a *).  You should use comments freely to document what you
 are doing.  This is made easy by the fact that comments can extend over
 several lines.  Delila ignores comments.
      Several instructions can be put on one line (the specifications, above)
 and one instruction can be spread over several lines (the request).
      The GET above defines two basepairs in the LAC sequence.  The sequence
 between (and including) these bases is put into the book.   Delila always
 puts sequence in the book 5' to 3'.  Thus to get the complement of the
 instructions above, one simply uses:
      GET FROM 1111 TO 29;



      RELATIVE VERSUS ABSOLUTE REQUESTS
      In contrast to EX2 we could write:

      TITLE "EX3: A RELATIVE GET";
      ORGANISM ECOLI; CHROMOSOME ECOLI; GENE LACI;
      GET FROM GENE BEGINNING
            TO GENE ENDING;

 In this case we did not state absolute numbers to define our book.
 Yet in all three examples (EX1, EX2, and EX3) the same PIECE will be
 generated in the book.
      There are two ways to define a base in a sequence.  One is to give
 its exact coordinate as in EX2.  That is called an ABSOLUTE reference.
 The other way is to define the distance from a fixed point, as in
 EX3: a RELATIVE reference.
      Both absolute and relative referencing have advantages and disadvantages.
 Using absolute coordinates allows us to pinpoint particular bases.  However,
 Delila libraries evolve over time, and when two previously separate
 PIECEs are fused, only one coordinate system is kept.  An absolute
 reference will not last.  On the other hand, a relative reference
 will last because the GENE BEGINNING will always be the start of the
 gene no matter what happens to the actual coordinate system.

(end of delman.use.language.4)

delman.use.language.5


      FORMS OF REQUESTS

      By now you may have noticed that there are two kinds of GET:
      GET ALL ... ;
      GET FROM ... TO ... ;
 The two positions of the FROM-TO form are independent as long as
 one refers to locations on the same PIECE.  In absolute terms one
 can say
      GET FROM -22 TO 56; (* ABSOLUTE *)
 or one can make it relative to a gene beginning:
      GET FROM GENE BEGINNING - 10
            TO GENE BEGINNING +  5;
 One can even write instructions relative to an absolute location:
      GET FROM 56 - 10 TO 56 + 5;
 This is to be pronounced "get from fifty-six minus ten to fifty-six plus
 five".  We will come back to this form later.

      MARKERs, GENEs, TRANSCRIPTs and PIECEs all have a BEGINNING and an
 ENDING that you can use.  For example,

      TITLE "EX4: NON-CODING LAC LEADER";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      GENE LACZ; (* NOW DELILA KNOWS THE PIECE *)
      TRANSCRIPT LACZ;
      GET FROM TRANSCRIPT BEGINNING
            TO GENE BEGINNING -1;

 Notice that both a GENE and a TRANSCRIPT can be specified at the
 same time.



      AMBIGUOUS DIRECTIONS
      Consider the circular genome of ORGANISM G4.  The numbering of the
 PIECE is from 1 to 5577.  Suppose that you asked for:
      TITLE "G4 COORDINATE PUZZLE";
      ORGANISM G4; CHROMOSOME G4; PIECE G4;
      GET FROM 1 TO 10;
 This is ambiguous!  There are TWO PIECES that run from 1 to 10:
 one clockwise and the other counterclockwise.  In this case Delila
 will supply you with the clockwise fragment.  However to be more
 specific in one's request, one would write:
      GET FROM 1 TO 10 DIRECTION +;
 or
      GET FROM 1 TO 10 DIRECTION -;
 But there are still two other possibilities!
      GET FROM 10 TO 1 DIRECTION +;
      GET FROM 10 TO 1 DIRECTION -;
 Delila is capable of handling most requests like these.  (Certain
 of the most complex cases remain to be solved.)

(end of delman.use.language.5)

delman.use.language.6



      RESPECIFICATION
      What if one wanted to specify more than one "leaf" (GENE, TRANSCRIPT,
 or MARKER) at one time?  Then one would use:

      TITLE "EX5: THE REGION BETWEEN LACI AND LACZ";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      PIECE LAC; (* NOW DELILA KNOWS THE PIECE *)
      GET FROM (GENE LACI) ENDING + 1 TO (GENE LACZ) BEGINNING - 1;

 This form is called a "respecification", to distinguish it from
 a specification.


      MULTIPLE REQUESTS
      After Delila has completed a GET, as in the last few examples, the
 specifications are still in effect and one can do more GETs,
 change the specification, more GETs, etc:

      TITLE "EX6: MULTIPLE SPECIFICATION AND REQUESTS";
      ORGANISM ECOLI;
         CHROMOSOME PBR322;
            GENE AMPR; GET ALL GENE; (* GET GENE OF BETA-LACTAMASE *)
         CHROMOSOME ECOLI; (* CHANGE SPECIFICATION *)
            TRANSCRIPT 16SRRNAB; GET ALL TRANSCRIPT; (* 16S RRNA *)
            TRANSCRIPT 23SRRNAB; GET ALL TRANSCRIPT; (* 23S RRNA *)
      ORGANISM PHIX174;
         CHROMOSOME PHIX174;
            (* GET TWO OVERLAPPING GENES *)
            GENE A; GET ALL GENE;
            GENE B; GET ALL GENE;


      WHEN DOES DELILA ACT?
      During Pass 2, Delila places the various items into the book.  Thus
 as ORGANISM, CHROMOSOME, GENE or TRANSCRIPT instructions are read,
 they are executed immediately.  This is not true for the PIECE in the
 example EX3 because at that point Delila does not know the endpoints
 of the sequence desired.  Delila "knows" which PIECE you are interested
 in, but not what particular bases.  When Delila reads the GET, the bases
 become apparent.  You can see this in the Pass 2 listing:  a PIECE
 is not given a number, rather the number is listed for the GET that
 generates the PIECE in the book.  The numbers are for objects in
 the book, not for those in the library.

(end of delman.use.language.6)

delman.use.auxiliary.programs


      AUXILIARY PROGRAMS: LISTER AND SEARCH

      In the section on language, we discussed how one can use Delila to
 generate books containing sequences one is interested in.  It is difficult
 to read the sequences in a book because they are in an awkward (from your
 viewpoint) compressed format.  In every day use, we almost never look
 inside a book because there is a much easier way:  generate a fancy
 listing using the program LISTER.
      In the section on the Delila language you used LISTER to look
 at the books that you generated.  (If you have not done this, then
 you should do it now.)  As other programs, LISTER will print
 sequence 5' to 3'.  If you want the complement, it is easy to use
 Delila to obtain it.
      LISTER is an example of an auxiliary program.  In contrast, Delila is
 the center of the Delila System.  The purpose of Delila is the
 manipulation of sequence information.  Other "auxiliary" programs
 perform tasks such as making listings or doing analyses.  These
 programs are explained in DELMAN.DESCRIBE.
      The only other auxiliary program that we will discuss here is the
 SEARCH program.  SEARCH will search a book for a simple pattern.  As
 you will recall, books have the same structure as libraries.  As
 SEARCH proceeds to look into an ORGANISM it will know the name of the
 ORGANISM:
      ORGANISM ECOLI;
 Then it will enter the CHROMOSOME:
      CHROMOSOME PBR322;
 Finally it begins to search a PIECE:
      PIECE PBR322;
 In other words, SEARCH can write Delila instructions that trace the
 search path.  Suppose that we had told SEARCH to search for the pattern
 5' AAGCTT 3' (HindIII).  We also tell it that the FROM should be -5 and
 the TO +10.  When search finds the site it can then write:
      GET FROM 29 -5 TO 29 +10 DIRECTION +;
 29 is the position of the first A of AAGCTT in PBR322.
 These Delila instructions are an answer to the search!

      You should try this and the other Auxiliary programs.


(end of delman.use.auxiliary.programs)

delman.use.data.flow


      DATA FLOW AND DATA LOOPS

      In the section on Auxiliary programs we discussed the use of the
 SEARCH program to locate patterns in books.  The search results appear
 in three ways:  on the screen, in a file for printing, and as Delila
 instructions.  These instructions can be given to Delila to generate
 the sequences of found sites.  One can view this entire process as a
 flow of data between one program and the next.  Since this manual can
 not have (nice) line figures, we strongly urge you to look at the flow
 figures in the published papers listed in DELMAN.INTRO.DESCRIPTION.
 Connecting parts of the Delila system together is much like playing
 with tinkertoys.
      Data flowing in the Delila system can pass through a program several
 times.  Our first example was the conversion of a book to a library and
 the subsequent extraction of book subsets.  The SEARCH program
 provides a more complex case where searching of a book generates
 Delila instructions that can be used to create a new book.  The new book
 is the set of located sequences.  This cyclic string of events is
 called a loop.

      Once you are acquainted with these data flow loops you can look at the
 SEPA program.  This program deals entirely with Delila instructions
 of the form:
      GET FROM 56 -40 to 56 +60;
 along with ORGANISM, CHROMOSOME and PIECE specifications.  The
 SEARCH program produces instructions in this form.  SEPA is used to
 separate instruction sets.
      For example, suppose you are interested in all the AluI (5' AGCT 3')
 sites that are not part of PvuII (5' CAGCTG 3') sites.  You have used
 DELILA and SEARCH to generate two sets of instructions, ALUIMIX and
 PVUII.  You then can use SEPA to get the set that you want:
      SEPA(PVUII,ALUIMIX,PVUIIO,ALUI)
 PVUIIO would be a reorganized non-redundant list of the PvuII
 instructions, and ALUI would list all AluI sites that are not
 PvuII sites.  Both our second and third papers describe the way that
 we use SEPA.  (Note: to do a search like this one must be sure that the sites
 are numbered the same way.  The search rule for AluI would be #AGCT,
 while the search for PvuII would be C#AGCTG.  The # symbol tells SEARCH
 to write the number of the following base in the instructions.  This forces
 the SEARCH program to number the same A in the two cases.)


(end of delman.use.data.flow)

delman.use.coordinates.1


      THE COORDINATE SYSTEM OF A PIECE

      In the sections on library structure and the Delila language, we kept
 touching on the topic of coordinate systems for PIECEs.  Delila is
 required to maintain the numbering of sequence fragments, and a
 coordinate system is the means to do so.  This is not a simple problem,
 for one must handle both linear and circular genomes.  For the new
 user, it suffices to know that Delila can do that, and you could
 skip this section.


      Let us start with the simpler case, a linear PIECE.  The SEQUENCE
 in the library is numbered consecutively from 1 to 100.  So far so
 good, we need to record three pieces of information:
      CONFIGURATION: LINEAR
      BEGINNING:     1
      ENDING:        100
 Any subset of the PIECE such as:
      GET FROM 40 TO 50;
 will also be linear and can be handled by these three variables.
 Notice that one could:
      GET FROM 50 TO 40;
 to obtain a complement.  In that case the BEGINNING is greater than
 the ENDING and the numbering decreases.

      What if the CONFIGURATION is CIRCULAR?  Then based on our discussion
 about ambiguous directions, we should at least add a
      DIRECTION:    +
 for linear sub-fragments.  However the situation can be worse than that!
      Let us imagine a circular PIECE in the library.  It is numbered 1 to
 100 in the direction 5' to 3' of one DNA strand.  We then make a
 request:
      GET FROM 10 TO 90 DIRECTION -:
 The PIECE to be placed in the book is 21 bases long, with descending
 numbers, EXCEPT for a COMPLETELY UNPREDICTABLE DISCONTINUITY where
 the numbering jumps from 1 to 100.  Some more information about the
 "parent" coordinates must be stored.

(end of delman.use.coordinates.1)

delman.use.coordinates.2


      The problem is to record the necessary coordinate information and to
 avoid becoming confused.  In the Delila System, the numbering of
 each PIECE has two parts: a COORDINATE part and a PIECE part.
      The COORDINATE part defines the location of a sequenced region on
 the genetic map.  Once that is established, the PIECE part tells what
 fragment is stored in the PIECE.  Both parts are transmitted to the
 book by Delila, but the coordinate part is fixed and unchanging while the
 PIECE part will vary depending on the fragment.  In summary so far:
      COORDINATE part = defines the relation of coordinates to the genetic map
      PIECE part = defines the relation of SEQUENCE to the COORDINATE part


 For the coordinate part:
 GENETIC MAP BEGINNING  This number locates the beginning nucleotide of the
 coordinate system on the genetic map.  We use these numbers to
 order the PIECEs in our Master library.

 The COORDINATE CONFIGURATION refers to the topological shape of the
 coordinates.  A linear genetic map could only have PIECEs with linear
 coordinates.  For a circular genetic map, circular coordinates may be
 chosen, but when only a portion of the sequence is known, each PIECE may be
 more conveniently handled as a linear coordinate system.

 A COORDINATE DIRECTION defines the orientation of the numbering system with
 respect to the genetic map.  + means "in the same direction as", - means
 "in the opposite direction as".

 The COORDINATE BEGINNING and COORDINATE ENDING nucleotides are integers
 that specify the limits of the coordinate system.  They are usually
 the ends of the largest known contiguous sequence.  The BEGINNING base
 corresponds to the genetic map beginning, the bases are consecutively
 numbered, and the ENDING is always greater than the BEGINNING number.

       The coordinate system described above provides a framework for stating
 the exact numbering of the SEQUENCE in a PIECE.  This also requires
 four items of information: configuration, direction, beginning and
 ending, all relative to the coordinate system.

 The PIECE CONFIGURATION may be circular only if the coordinate
 configuration is also circular.  When the coordinates are linear, the
 PIECE must also be linear.

 The PIECE DIRECTION may be + or - with respect to the coordinates,
 representing homology or complementarity to the coordinate system.

 The PIECE BEGINNING and ENDING are the numbers of the endpoints of the
 SEQUENCE.  Both must lie within the bounds set by the COORDINATE BEGINNING and
 ENDING.  The BEGINNING is always the 5' end of the molecule.

(end of delman.use.coordinates.2)

delman.use.coordinates.3


      It turns out that this system handles all the confusing cases noted
 earlier.  To write out the nine values of coordinates we will keep
 this order:
      (GENETIC MAP BEGINNING,
       COORDINATE CONFIGURATION,
       COORDINATE DIRECTION,
       COORDINATE BEGINNING
       COORDINATE ENDING,
       PIECE CONFIGURATION,
       PIECE DIRECTION,
       PIECE BEGINNING,
       PIECE ENDING)
 The linear piece that we began this section with would be:
      (1,LINEAR,+,1,100,LINEAR,+,1,100)
 (The GENETIC MAP BEGINNING and COORDINATE DIRECTION are arbitrary.)

 The first subset was "GET FROM 40 TO 50;":
      (1,LINEAR,+,1,100,LINEAR,+,40,50)

 The complement: "GET FROM 50 TO 40;" is:
      (1,LINEAR,+,1,100,LINEAR,-,50,40)


 The circular PIECE is:
      (1,CIRCULAR,+,1,100,CIRCULAR,+,1,100)
 The request
      GET FROM 10 TO 90 DIRECTION -;
 would make:
      (1,CIRCULAR,+,1,100,LINEAR,-,10,90)

      You should work out the results for the other three possible request on
 this circular PIECE:
      GET FROM 10 TO 90 DIRECTION +;
      GET FROM 90 TO 10 DIRECTION +;
      GET FROM 90 TO 10 DIRECTION -;

 HINT: It helps to make diagrams.


      The catalogue program, described in DESCRIBE.CATAL, will list
 the coordinate systems for pieces of a book or library in tabular format.

(end of delman.use.coordinates.3)

delman.use.control.1


      HOW TO CONTROL THE RESPONSES OF DELILA

      There are several situations in which Delila manipulates the information
 in a library in a way that may not always be what one wants.  That is,
 there are certain things that Delila does in the absence of any instructions.
 These default actions can be changed by using a special class of
 instructions - they are called default resets.  There are four basic
 kinds of default (as defined in LIBDEF) but we will discuss only
 three of them here.

 OUT-OF-RANGE DEFAULT
      We discussed this default in the section on the Delila language
 (DELMAN.USE.LANGUAGE).  A request may be outside the limits of a PIECE
 in a library for two reasons:
 1) The place is outside the coordinate system and is therefore
 unsequenced (Delila calls it "unknown").
 2) The place is within the coordinates, but the PIECE does not
 extend that far in the particular library being used.
      In either case, Delila's actions will be based on the RANGE default:
      DEFAULT OUT-OF-RANGE REDUCE-RANGE;
 Delila will attempt to find the nearest edges of the PIECE and use
 these.  (NOTE: there are known bugs associated with this process,
 although it works in almost all cases.)
      DEFAULT OUT-OF-RANGE CONTINUE;
 Delila will not place the requested PIECE in the book, and will
 continue to process any further instructions.
      DEFAULT OUT-OF-RANGE HALT;
 Delila will stop processing instructions.  The book will not be useable
 by auxiliary programs.
      In all cases, a warning message is put into the listing.

 KEY DEFAULT
      One can use this default to prevent the information about MARKERs,
 TRANSCRIPTs and GENEs from going into the book.  For example:
      DEFAULT KEY GENE OFF;
 will turn off printing of the GENE information.  The various data
 items in a library will contain free form notes about the object.
 (You can use the REFER program to look at these.)  This command can
 also be used to turn off the NOTEs when one wants to reduce the size
 of the resulting book.

(end of delman.use.control.1)

delman.use.control.2


 NUMBERING DEFAULT
      In the section on language we discussed the numbering of the items going
 into a book.  This command is used to control the numbering.  One can
 turn it on or off:
      DEFAULT NUMBERING OFF; (* NOTHING FROM HERE ON WILL BE NUMBERED *)
 One can set numbering for particular items:
      DEFAULT NUMBERING PIECE; (* ONLY PIECES WILL BE NUMBERED *)
      DEFAULT NUMBERING TRANSCRIPT GENE; (* BOTH TRANSCRIPTS AND GENES
                                            WILL BE NUMBERED *)
 To make numbering more flexible, one can reset the number that the
 next item will get:
      DEFAULT NUMBERING 27; (* THE NEXT ITEM WILL BE NUMBERED 27 *)
 This default can be used to make sure that particular items will
 have the same numbers in different books.
      The number will be put into the notes of the item as the first line
 in the notes.  This allows them to be easily found by auxiliary
 programs.

 NOTE INSERTION
      One can put one's own notes into the next object placed in the book
 by using:
      NOTE "THIS IS THE REPLICATION ORIGIN FROM PHIX174";
      GET FROM ...
 Since this is not a default reset, it does not use the word "default".
 The new notes will follow the notes that were in the library.  By
 turning off notes from the library, and using note insertion, one can replace
 notes in a library.  Notes in PIECEs can be seen with program REFER.


      One can put these default or note insertion statements anywhere
 in a set of Delila instructions.  More details on these and other
 commands can be found in LIBDEF.


    All the defaults have initial values:

    default type       initial value
    ============       ==============
    KEY
         NOTE           ON
         MARKER         ON
         TRANSCRIPT     ON
         GENE           ON

    OUT-OF-RANGE        HALT

    NUMBERING           ON, 1, ALL


(end of delman.use.control.2)

delman.use.comparison


         SEQUENCE COMPARISONS AND STRUCTURE ANALYSIS

      The purpose of this section is to point out auxiliary programs that can
 be used to compare two sequences or find structures in a sequence.

      Sequence comparisons can be done with DOTMAT, which forms all possible
 pairs between sequences in two books.  For each pair, one sequence
 is put on the X axis of a coordinate system and the other is on the Y
 axis.  Both 5' ends are at the origin and X runs down the printout
 page while Y runs across the page.  (Simply rotate the page 90 degrees
 counter-clockwise to get standard Cartesian coordinates.)  The
 sequences are compared for complementarity at each possible (X,Y)
 pair formed between the two sequences.  A "dot" is placed at a coordinate
 if pairing can occur.  Notice that the display will be symmetrical
 around the line Y = X.  Long stretches of pairing will run on diagonals
 (along segments of lines Y = -X + C).  To look for homology using
 DOTMAT, use DELILA to obtain the complement of one of the pieces.
      DOTMAT produces all possible pairings.  Sometimes one wants to
 eliminate the short helixes, to make finding the longer ones easier.
 The pair of programs HELIX and MATRIX will do this.

      One can use these two programs to find overlaps between sequences
 obtained by shot-gun cloning.  Put the complete sequence on the X axis book
 and 20 bases from each end of the other sequence in the Y axis book.
 Search for long oligo's, say 15 or longer.  If there is a significant
 overlap, you will get a response from HELIX.

      Another program that can be used for comparisons is the INDEX program.
 With this tool you can make an index of the locations of the oligo-
 nucleotides in a book.  The measure of the similarity between
 oligonucleotides in the final alphabetized list of oligo's is related
 to sequence homologies.  This method is extremely powerful.

      MATRIX/HELIX vs INDEX
 MATRIX/HELIX
      advantage:  The 2 dimensional plot is easy to look at.
      disadvantage:  It is slow.  For two sequences M and N bases long, a
         dot matrix operation takes MxN operations.  It is so-called Order
         N Squared in computation time since the time to compare a sequence
         with itself is a function of the square of the sequence length.

 INDEX
      advantage:  It is fast, since the sorting algorithm is order NlogN.
      disadvantage:  One can't get a feeling for the results easily.  One
         method is to mark listings made with LISTER.

(end of delman.use.comparison)

delman.use.aligned.books


      HOW TO MAKE AND USE ALIGNED BOOKS

      WHAT IS AN ALIGNED BOOK?
      To perform statistical analysis on sequence sites (eg. ribosome binding
 sites, promoters, splice junctions, etc.) one needs a way to align a set
 of PIECEs in a book.  For ribosome binding sites, we have used the A of
 the AUG or various points in the Shine/Dalgarno.  A book is aligned by
 chosing one base from each PIECE to be the alignment point.  The alignment
 bases could be chosen by a list of coordinates, but we have found that there
 are advantages to using Delila instructions to specify the base:

      TITLE "EX7: ALIGNED BOOK";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      PIECE LAC;
      GET FROM 29 -5 TO 29 +10; (* LACI RBS *)
      GET FROM 1234 -5 TO 1234 +10; (* LACZ RBS *)

 Here, the zero point for LACI alignment is base 29 and for LACZ it is base
 1234.  The "from parameter" is -5 and the "to parameter" is +10.
 The instructions allow one to align the book that is created from the
 instructions.  WARNING: the instructions must follow a rigid format; this
 is described in DELMODS in module info.align, along with details on
 how to write programs using aligned books.
      (See also DELMAN.USE.DATA.FLOW and DESCRIBE.ALIST)

      AUXILIARY PROGRAMS FOR ALIGNED BOOKS
      After generating an aligned book (a book and an aligning instruction set)
 one can list it using program ALIST or obtain a histogram that tells the
 composition of the book at each point relative to the aligned base
 with HIST.  A chi-squared analysis of an aligned book is done using HISTAN.

      GENERATING A SET OF ALIGNED RIBOSOME BINDING SITES
      We have provided the instructions for creating a set of aligned gene
 starts, in file GAIN.  GAIN was originally created from instructions
 of the form:
      ORGANISM ...; CHROMOSOME ...;
      GENE ...;
      GET FROM GENE BEGIN TO GENE BEGIN +2;
      ...
 This is file GRIN (genes relative to begin instructions).
 The resulting book was searched (one would use SEARCH with a rule of
 (A/G/T)TG ) to generate the instructions in aligned form.  GAIN was
 then made by replacing the from-position with the word FIRST and the
 to-position with LAST.  To use GAIN you must first create the
 transcript library from file TRAIN (TRAnscript library Instructions,
 use DELILA with LIB1 and LIB2).  Then replace FIRST and LAST with
 the desired range.  Notice that there are a few cases, marked
 "SPECIAL" that you must deal with individually.  Notice also, that genes
 that are oriented in the direction opposite the PIECE had to be set up
 by hand (this may be automated someday).  The instructions could now
 be named GAIN1, and DELILA can be used to generate the aligned book.

      A detailed example of these operations is given in
 DELMAN.CONSTRUCTION.EXAMPLE.

(end of delman.use.aligned.books)

delman.use.perceptron.1


      USE OF THE PATTERN PROGRAMS

      "Perceptron" is the name given to a class of algorithms for pattern
 recognition with learning capabilities.  Minsky and Papert have written an
 excellent book on the topic ("Perceptrons", MIT Press, 1969) which explores
 both the limitations and potentials of the method.  They also prove the
 "Perceptron Convergence Theorem" which guarantees that a solution will be
 found if one exists.  We have written an article (Stormo, et. al., 1982,
 Nucleic Acids Research, 10: 2997-3011) which describes our use of the
 algorithm to investigate translational initiation sites.
      The algorithm takes as input patterns which can be divided into two
 classes, and finds a "Weighting Function" which serves to distinguish the
 patterns in the two classes.  More rigorously, if we encode a sequence into
 a string of bits, S, the algorithm attempts to find a W such that W*S >= T
 (some "threshold") if and only if S belongs to one class of the two classes of
 sequences.  We mean by "*" the dot, or inner product of S and W, which are
 vectors of the same dimensions.  If we start with two sets of sequences,
 S+ and S-, and an arbitrary W and T, the algorithm can be described by
 the following three step procedure:
       Test: choose a sequence S from S+ or S-,
             if S is in S+ and W*S >= T go to Test,
             if S is in S+ and W*S <  T go to Add,
             if S is in S- and W*S <  T go to Test,
             if S is in S- and W*S >= T go to Subtract;
       Add:  replace W by W + S,
             go to Test;
       Subtract: replace W by W - S,
             go to Test.
 An example of this process is shown in our NAR paper (reference given above).
 (Note: this process can be done without goto's...)

      The program which implements the perceptron algorithm to work on
 sequences is called PatLrn.  Other programs which use the output of PatLrn
 are:
       PatLst - a lister program for the output of PatLrn;
       PatAna - does some simple analyses of the output of PatLrn;
       PatVal - evaluates the aligned sequences in a book by the PatLrn output;
       PatSer - searches a book for sites which are evaluated with a given
                PatLrn W output to be above some user specified value.

(end of delman.use.perceptron.1)

delman.use.perceptron.2


      EXAMPLES FOR THE PATTERN PROGRAMS

      The files "exspbk" and "exsnbk" are the sets of positive and negative
 sequences used in the example of Figure 1 of our "Perceptron" paper (NAR 10,
 2997-3011).  The file "expa1" contains the initial pattern from that same
 example.  Given these files and the program "PatLrn" you can recreate
 the example thusly:
      PatLrn(exspbk,a,exsnbk,b,pat,expa1).
 The file "pat" should be identical (except for the date/time) to the file
 "expa2" that we have provided.  You can check that with the "Merge" program
 if you want.  It is also identical to the solution pattern from the example
 and it keeps track of the number of changes needed to get to that solution.
 The files "a" and "b" are empty in this case, because we are aligning the
 sequences by their first bases.  If we wanted to align them by any other
 base those files would contain the instructions which generated the sequences
 (see DELMAN.USE.ALIGNED.BOOK).

      Now use the program "PatAna" to do some simple analyses of the pattern.
      PatAna(pat,patan).
 The file "patan" is identical to the file expan2 that we provided.  It
 contains some useful information about the pattern, such as the minimum and
 maximum sequence values which could be obtained from this pattern, as well
 as the average value expected for random sequences and a feeling for the
 distribution of values.

      The program "PatVal" will use a pattern to evaluate a book of sites.
 Try:
      PatVal(exspbk,a,pat,valp).
         and
      PatVal(exsnbk,b,pat,valn).
 "valp" is the evaluation of each sequence of the positive class, and "valn"
 is the evaluation of each of the negative class sequences.  Check with the
 example in the paper to see that they are correct.  Again the "a" and "b"
 files are empty because we are aligning by the first base of the sequences.

      The program "PatSer" will use a pattern to search through a sequence,
 using each base in turn as the aligned base.  Those sites which are
 evaluated above some minimum, either set by the user or taken to be the
 minimum functional from the pattern itself, are identified.  Furthermore,
 instructions to get those sites so identified are written to the file "inst".
 Try this on an example file:
      PatSer(exsebk,pat,val,inst).
 notice that when the pattern extends beyond the sequence the sites are still
 evaluated, but the user is notified of the over-extension.

      The program "PatLst" is used to make nice horizontal printings of the
 patterns, such as for use as publishable figures.  Try this on the W51
 matrix which is from the paper and which we provide.  Read the page
 DESCRIBE.PATLST to see how to set the width of the pattern printed to
 a page to whatever you want.

(end of delman.use.perceptron.2)

delman.use.perceptron.3


      A NOTE ABOUT SIGNIFICANCE

      While the example we provide in the paper, and that you have just done,
 is convenient for demonstrating the method, separating two sets of two
 sequences, each five long, is in fact trivial.  Try:
      PatLrn(exspbk,a,exsnbk,b,newpat).
 "newpat" is identical to "expa0" that we provided, and as you can see is
 not interesting.  The mathematical problem of when it becomes
 significant that one can separate two sets of sequences is still an open
 problem, but we can say some things.  As the number of sequences in each
 class gets larger the probability of separation decreases, as it does
 when the number of nucleotides in each sequence diminishes.  As a good
 rule of thumb we like to have more sequences in the smallest class
 (usually the functional class) than there are nucleotides in any one
 of the sequences.  Under these conditions one can be reasonably confident
 that a solution pattern is likely to identify features of biological
 significance.

(end of delman.use.perceptron.3)

delman.use.encode.1


USE OF THE "ENCODE" PROGRAM

The program Encode was written to allow a user to encode sequences into
strings of integers in a flexible way.  For instance, one can encode
the sequences as mono-, di-, tri-, or higher oligonucleotides.  One can
assign specific oligos to certain positions or record only that they are
within some "window" of positions.  Within a window all the oligos may
be counted or only some, such as only those "in frame".
The program takes as input the book of sequences and the instruction set
which generated it and which specifies the alignment.  If the instruction
file is empty then all the sequences are aligned by their first bases.
The other input file, which must be non-empty, is the parameter file
"EncodeP" which specifies how the sequences are to be encoded.  It is
the options of the parameter file which give the program its flexibility
and power, and so they should be thoroughly understood.
The parameter file may contain any number of individual parameter records,
each of which will in turn be applied to each sequence in the book.  This
allows one to encode different regions of the sequences differently, or
to encode one region in more than one way.  Each parameter record has
five pieces of information, each written on a separate line:
      line 1 - the range over which this parameter record is to operate;  this
               line has two integers which are the bases, relative to the
               aligned base, for which to use this encoding;
      line 2 - the size of the window; the window begins at the start of the
               range and contains this many nucleotides in it;  the number
               of each base, or oligo, which occurs in this window is written
               to the output; note that positional information within the
               window is lost, so that if exact position is needed the window
               size should be 1;
      line 3 - the shift to the next window; this specifies how many bases
               to move the window over to its next position; this is repeated
               until the window begins beyond the end of the range;
      line 4 - this specifies the coding level, and the arrangement of the
               bases to be coded; the coding level is the number of bases in
               the oligos which are encoded, i.e., 1 means monos are encoded,
               2 means dis are encoded, ...; for coding levels greater than 1
               the user may allow for skips between the encoded bases;  for
               instance, one may want to encode as di-nucleotides bases which
               are separated by a nucleotide; this would be declared on this
               line by writing "2 : 1"; likewise, one could encode as a tri-
               nucleotide the first bases of three consecutive codons by the
               line "3 : 2 2", where the 3 indicates the coding level (tri-
               nucleotides) and the 2's represent the number of bases
               skipped between each encoded base; if there is no colon after
               the coding level declaration, all skips are assumed to be 0;
      line 5 - the shift to the next coding site; this allows the user to
               not count every occurrence of the oligos in the window, but
               rather to move some number of bases to the next encoded site;
               if all the oligos are wanted, this number should be 1.
The above line information constitutes a single parameter record.  The
parameter file may contain any number of these records concatenated
together.  Each sequence will be encoded by the entire list of parameter
records and the resulting string of integers will be written to the
"EncSeq" file.  The encoded string for each sequence ends with a special
"end of sequence" symbol, which is listed in the file header.
For examples of how this program works see "DELMAN.USE.ENCODE.2".

(end of delman.use.encode.1)

delman.use.encode.2


EXAMPLES OF USING THE "ENCODE" PROGRAM

The files "ExEncIn" and "ExEncBk" contain the sequence around the beginning
of the rIIB gene of T4, and the instructions which align this sequence by
the ATG of the gene.  The aligned sequence looks like:

       ---                   ++
       111--------- +++++++++11
       210987654321012345678901
       ........................
       ATAAGGAAAATTATGTACAATATT

Notice that the 0 base is the A of the ATG (this is what we aligned by) and
that our sequence contains the 12 preceding bases and the 11 following.  This
is through the fourth amino acid of the protein.  If we wanted to encode only
the mono-nucleotides of the initiation codon we would make our parameter file:
   0 2
   1
   1
   1
   1

this would give the encoding:
 1 0 0 0 0 0 0 1 0 0 1 0 -1

Notice the -1 which specifies the end of the encoded sequence.  Each 4 integers
before that specifies which base occurs at each of the three encoded positions.
The A is encoded as 1 0 0 0, the T as 0 0 0 1, and the G as 0 0 1 0.

If we wanted to know the number of each mono-nucleotide in this whole region
and we didn't care about their positions, we would encode as:
   -12 11
   24
   24
   1
   1

This would give the encoding:
 12 1 3 8 -1

Notice that this is really just the composition of the sequence, since our
window covers the entire sequence.  We could get the di-nucleotide composition
with the parameters:
   -12 11
   24
   24
   2
   1

and get the encoding:
 5 1 1 5 1 0 0 0 1 0 1 1 4 0 1 2 -1

Notice that this encoded string is a vector of 16 integers (up to the end
of sequence mark, -1).  The number in each element of the vector is the number
of each di-nucleotide in the sequence, in the order AA,AC,AG...TC,TG,TT.

Examples continued in DELMAN.USE.ENCODE.3.

(end of delman.use.encode.2)

delman.use.encode.3


Examples of using the "encode" program, continued from
DELMAN.USE.ENCODE.2.

      To encode the di-nucleotide composition of the Shine and Dalgarno region
and also the mono-nucleotides of the coding sequence, each in its own position,
we would make this list of parameters:
   -10 -6
   5
   5
   2
   1
   0 11
   1
   1
   1
   1

This would give us the encoding:
 2 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1
 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1
 -1

Here the first 16 integers are the di-nucleotide composition of the Shine and
Dalgarno region, and appended to that are the mono-nucleotide encodings for
each position of the coding sequence.  We could get the di-nucleotides of
successive codon first positions by:
   0 11
   12
   12
   2 : 2
   3
or we could get the codon composition by:
   0 11
   12
   12
   3
   3
or we could get the di-nucleotide encoding of the first and last position of
each codon, including the position of the codon by:
   0 11
   3
   3
   2 : 1
   3
These are left as exercises to the user, and it is encouraged that the user
make up other tests and try them until this program is easy to use.

(end of delman.use.encode.3)

delman.use.dbpull.define


  In addition to Delila, there are at least two other generally available
large nucleic sequence data bases. The DB program system handles both the
European Molecular Biology Laboratory (EMBL) libraries and those of the
Genetic Sequence Databank (GenBank(TM)).
  If you want to contact someone who helps operate these data bases use the
following addresses:
                   GenBank
                   c/o Computer Systems Division
                   Bolt Beranek and Newman Inc.
                   10 Moulton St.
                   Cambridge, Ma. 02238
                   USA

                   Graham Cameron
                   European Molecular Biology Laboratory
                   Postfach 10.2209, 0-6900 Heidelberg, West Germany

  The DB program system is a small set of programs. DBcat prepares
catalogs for DBpull. DBpull extracts part or all of an entry of either
EMBL or GenBank format. DBbk converts database entries into the Delila
book form that Delila programs use. All of these programs handle both data
base formats even when both occur together in the same library.
  At this point, please obtain some sample library entries from both data
bases and look them over.
  Embl and GenBank libraries are arranged in series of entries, each entry
possessing a unique entry id, a nucleic acid sequence, and other miscellaneous
information. Most of the lines in the libraries start with a word or abbreviated
code that indicates what kind of information the line contains. The
following definitions will clarify these points.

Library definitions:

Entry: An entry starts with a line which begins with an "ID" (EMBL) or a
"LOCUS" (GenBank). All subsequent lines are part of the entry until the
line that contains simply "//". "//" is the entry terminus code for both
data bases.

Entry id: On the first line of each entry, after the "LOCUS" or the "ID",
comes a few spaces and then a weird looking word or code that may or may not
resemble a familiar biological name. This is the entry id, it is the name the
entry is known by and it is what DBpull uses to identify which entries it
will extract.

Line codes: The phrases "ID" and "LOCUS" are line codes. There are other line
codes in each entry such as "REFERENCE" and "ORIGIN" in GenBank and "DE"
"SQ" in EMBL. Some lines do not have a code and some have one, but it is in-
dented. Other lines have codes, but there is no other information on the line.
these special cases will be discussed below in the definition of line code
request instructions.

  Now that you are familiar with the data bases you can understand the DBpull
instruction set. Each instruction takes up only one line. Each line does one
of two things; either it indicates what entry type (GenBank or EMBL) is
requested on the following lines or it makes an actual request for part or
all of an entry identified by its entry id. Please note that the following
definitions will be made clearer by referring to the examples that follow.

(end of delman.use.dbpull.define)

delman.use.dbpull.instructions


Note: Instructions are entirely upper case because that is what the computer
      system DBpull was designed on required.

Instructions that determine entry request type of succeeding lines:

  EMBL: This indicates that requests for entries somewhere in the EMBL
  libraries will be on the following lines.
  GENBANK: Same for requests found in the GenBank libraries.
  GENB: Same as "GENBANK".

Instructions that tell which entries are to be pulled:

  Entry id: An instruction line beginning with an entry id will pull part
  or all of that entry. The parts extracted will depend on which of the
  "instructions that define extraction" (defined below) follows the id on
  the same line.

  Wildcard id: This request looks like an entry id request but somewhere
  in the entry name are one or two "*" symbols. The "*" represents any
  number of unspecified characters. It may be inserted at the beginning of
  the id, at the end, or at both the beginning and the end but not the
  middle. (Confused? see instructions example 3 below)

  EVERY: The word "EVERY" at the start of a request line calls for every
  entry of a particular entry type. (See instruction example 4)

Instructions that define extraction:

  Line codes: Following the instruction that tells which entry or entries
  are to be pulled, on the same line, come instructions that structure
  the extraction. One or more line codes occurring in this space will result
  in the lines of the entry which have matching codes being pulled. Genbank
  line codes are actually words. The full word or an abbreviation will work,
  but the abbreviation can not be shorter than 3 letters. "LOC", for instance,
  will pull the "LOCUS" line while "LO" would not. When there are one
  or more lines in the entry directly below a pulled line that either
  do not possess a line code, posses indented codes, or posses the code "xx",
  these additional lines will be extracted also.

  RAW: Instead of line codes one can simply insert the word "RAW". This will
  pull only the sequence of the entry without origin or coordinate labels.
  The sequence will end with a "." to separate it from other sequences and to
  make it suitable for input into Makebk. (see delman.describe.makebk) Also,
  if the first request of fin is "RAW", fout will have no dateline and
  therefore it will not make a suitable secondary data base for DBpull.

  ALL: Instead of "RAW" or line codes the word "ALL" will result in an
  entire entry being extracted.

(end of delman.use.dbpull.instructions)

delman.use.dbpull.examples


Instruction examples (DBpull input file Fin)

Example 1:

EMBL
ADCXXX ID DE SQ
GENBANK
M13 LOC REFERENCE
ANABANIFH LOCUS

Comments: The first and third lines indicate what types of entries are
requested on the following lines. If, for instance, M13 were an EMBL entry
this set of instructions would not find it.

Example 2:

GENB
T7 RAW
MS2 ALL

Comments: The two requested ids are not in alphabetical order and the
DBpull output file fout will have the same order as the requests.

Example 3:

EMBL
*RNA SQ ID
*RNA* ID SQ
GENB
M* ORI SITES
GOOGOOGAGA ALL
T7 RAW

Comments: The character "*" is a wildcard; it represents any number of
unspecified characters.
The first request will grab any entry whose id ends in "RNA", the
second any one that has "RNA" anywhere in it, and the third any id which
starts in an "M". The fourth request is a joke and, like any other non-
existent id, will yield a "not found" message and then halt the program. If
there were no GenBank entry ids beginning in "m" a "not found" would appear
but DBpull would not halt because this id request is a wildcard. The logic
behind this distinction is that wildcards are used to search for the
possible existence of an entry, but regular ids are used only for entries
that are well known by the user. Note that "ORI" (origin) pulls sequence in
GenBank and "SITES" tells you where the genes and other features are. "SQ ID"
and "ID SQ" are equivalent; lines are pulled in the order that they occur.

Example 4:

EMBL
EVERY ID
GENB
EVERY LOC

Comments: This example would make a catalog for users of the entire EMBL
and GenBank data bases. The catalog would be alphabetical because the
catalog files used by DBpull (produced by DBcat) are presorted. If
"catalogs for humans" are provided with your libraries do not try this
example; it is very expensive. If you do try it, you might want to request
additional line codes to "LOC" and "ID" for a more informative catalog.

(end of delman.use.dbpull.examples)

delman.use.search.1


                  Use of the Search Program

i. searching dna sequences for particular strings
      The search program works on books of sequences.  Any search pattern
will be looked for in each sequence of the book.  Search patterns consist
of strings of nucleotides, such as 'aatggct'.  You may also specify
ambiguous patterns, such as 'a or g', in either of two ways: '(a/g)' or
'r'.  All possible ambiguities can be asked for, by either way.  From
within the search program type 'l' to see the list of one-letter codes
for each ambiguous base combination.  One can also include in the search
positions for which you don't care what the base is, indicated by 'n'.
For instance, 'anc' would search for a and c separated by any base.  One
can also use 'e' (for extension) to vary the spacing between specified
regions.  The 'e' is considered to be an 'n' and also as nothing.  For
example, 'aec' would search for both 'anc' and 'ac'.  We used this feature
to search for 'shine and dalgarno' sequences before 'atg's by specifying
'gga5n4eatg'.  This means 'gga followed by 5 to 9 unspecified bases followed
by atg'.
      One can search for strings which are close to the specified by allowing
mismatches to the specified sequence.  This is done by typing 'm' as a
search command, and then specifying how many mismatches are allowed.  If
there are regions within the specified sequence where you want no mismatches,
this is stated by enclosing that region between and '<' and '>'.  For example,
if mismatches were set to 1 and the pattern searched were 'aatt', then
the 'ggc' must be found exactly, but the rest of the pattern need only be
within one of a perfect match.
      The search program returns to you the positions of the matches found in
the book.  Unless otherwise specified, the position corresponds to the first
base of the pattern.  However, one can ask for the position to be another
base by preceding that base by '#'.   For example, 'aa#atggct' would return
as the position of the match the 'a' of the 'atg'.
      It is also possible to make searchs for relations between bases.  Six
relations are allowed: identity (i); non-identity (ni); complementarity (c);
non-complementarity (nc); complementarity including g-t pairs (w); and
non-complementarity including g-t pairs (nw).
Relational searchs are specified by first
the symbol '^', followed by the pattern position this base is to be related
to, followed by the relation.  For example, 'n^1i' would find all sites in
which there is a repeated base (aa, cc, gg or tt).  Notice that the base
to which the relation refers must proceed the point of the relation in the
pattern.  Searching for the pattern '5n^1c' would find sites of complementary
bases separated by 4 unspecified bases.
      More information on search patterns  and other commands in general
can be obtained by typing 'help' while in the program.

(end of delman.use.search.1)

delman.use.search.2


ii.  Creating Delila Instruction Files
      The search program also allows one to create instruction files so
that the located sites may be put into a book for further analysis.  This
is especially useful when you want to include in the analysis regions around
the sites.  For instance, you could set the 'from' distance to -60 and the
'to' distance to +40.  Then by searching for 'gga5n4e#atg' you would get
the instructions necessary to obtain the sequences from -60 to +40 around
the atg's which are preceded by Shine and Dalgarno sequences.  Help on
using this feature of the program can be obtained by typing 'd help' while in
the program.

(end of delman.use.search.2)

delman.construction










           cccccc    oooooo   n     nn
          cc    cc  oo    oo  nn    nn
          cc        oo    oo  nnn   nn
          cc        oo    oo  nnnn  nn
          cc        oo    oo  nn nn nn
          cc        oo    oo  nn  nnnn  --------
          cc        oo    oo  nn   nnn
          cc    cc  oo    oo  nn    nn
           cccccc    oooooo   nn    nn


 ssssss   tttttttt  rrrrrrr   uu    uu   cccccc   tttttttt
ss    ss     tt     rr    rr  uu    uu  cc    cc     tt
ss           tt     rr    rr  uu    uu  cc           tt
 ssssss      tt     rr    rr  uu    uu  cc           tt
      ss     tt     rr    rr  uu    uu  cc           tt
      ss     tt     rrrrrrr   uu    uu  cc           tt     --------
      ss     tt     rr  rr    uu    uu  cc           tt
ss    ss     tt     rr   rr   uu    uu  cc    cc     tt
 ssssss      tt     rr    rr   uuuuuu    cccccc      tt


          iiiiiiii   oooooo   n     nn
             ii     oo    oo  nn    nn
             ii     oo    oo  nnn   nn
             ii     oo    oo  nnnn  nn
             ii     oo    oo  nn nn nn
             ii     oo    oo  nn  nnnn
             ii     oo    oo  nn   nnn
             ii     oo    oo  nn    nn
          iiiiiiii   oooooo   nn    nn












(end of delman.construction)

delman.construction.intro


      CONSTRUCTION OF DELILA LIBRARIES

      Introduction
      This section assumes that you are familiar with DELMAN.USE.
 Construction of a Delila System Library involves several steps:
      - Entry of the raw sequence data (twice)
      - Correction of the sequences
      - Gathering of the information about the sequences
      - Creation of a "module" for insertion into the library
        (not the same module type as the ones used by program Module.)
      - Insertion of the module
      - Construction of a catalogue
      - Checking that the library is correct.

      When you are gathering the data to create part of a library
 (the library insertion module) you may find the forms in
 DELMAN.CONSTRUCTION.FORM useful.  Use the Module program to make
 as many copies as required.


      NOTES FOR TRANSPORTATION
      Since the libraries that we send you have already been checked, you
 need only run the CATAL program (as discussed below) to generate the
 catalogues for these libraries.  After that, Delila can be used.

(end of delman.construction.intro)

delman.construction.structure


      MORE ON LIBRARY STRUCTURE - LOGICAL VS PHYSICAL STRUCTURE

      In DELMAN.USE.STRUCTURE we discussed the structure of a Delila
 Library.  The descriptions were about how the parts are connected,
 and what is inside each part.  This is the logical structure of the
 data base.  We did not discuss the details of how a library is actually
 constructed, because it is not necessary to know these things when
 working with the Delila System.  The description of these details
 is the description of the physical structure of the data base.

      Since we do not yet have an extensive set of tools for constructing
 Delila Libraries, it is necessary to describe the physical structure
 enough so that you can build your own libraries.  Because these details
 are rigorously stated in LIBDEF, most things are automated by program
 Makebk, and Catal does lots of checking, we will only discuss the general
 concepts here.

      The logical structure of a library follows the schema shown in LIBDEF
 or DELMAN.USE.STRUCTURE.  This structure is a two dimensional net.
 Libraries are implemented physically in files, and so are linear
 structures.  If we exclude for the moment the references to a PIECE
 by MARKERs, TRANSCRIPTs and GENEs, then the library structure is a
 a tree.  Any tree can be represented as a nested series of objects
 in linear order:
      ORGANISM   (open  parenthesis for an ORGANISM)
      CHROMOSOME (open  parenthesis for a  CHROMOSOME)
      GENE       (open  parenthesis for a  GENE)
      GENE       (close parenthesis for a  GENE)
      PIECE      (open  parenthesis for a  PIECE)
      PIECE      (close parenthesis for a  PIECE)
      CHROMOSOME (close parenthesis for a  CHROMOSOME)
      ORGANISM   (close parenthesis for an ORGANISM)

 If you look at any book (eg. EX0BK) or library (eg. LIB1) you will
 see this structure.  Lines in a library either define the structure
 or are chunks of data (attributes).  Attributes are signaled by an
 asterisk (*) as the first character on the line.


      We must now allow various objects to refer to PIECEs.  This is done
 by a reference to the name of the PIECE.  For example, one of the
 attributes in a GENE is the name of the PIECE that the GENE is on.
 (In cases where the GENE spans two PIECEs, we use two GENEs.)

      To simplify the operation of the CATAL program (to be described later)
 we have added one more rule.  All objects that refer to a particular
 PIECE are called the "FAMILY" of the PIECE.  The rule is that a
 FAMILY precedes its PIECE in the physical (file) implementation.

(end of delman.construction.structure)

delman.construction.catal


      MAKING NEW LIBRARIES - THE CATALOGUE PROGRAM

      The first technical difference between Libraries and Books in the Delila
 System is that Libraries have catalogues while Books do not.  Catalogues
 serve several purposes.  First, since they are a condensed list of
 the objects in a Library, they allow objects to be found quickly.
 There are catalogues for both Delila and for people (the latter is
 called a HUMCAT - HUMan's CATalogue).  These are constructed by the
 program CATAL.

      Since a library may be constructed by hand, it is also convenient to
 check the Library's physical structure at the time the catalogue is made.

      The Problem Of Duplicate Names
      Using Delila, a Book may be easily constructed that contains two objects
 within the same structure (if they are in different structures, it
 won't matter).  For example:
      ORGANISM ECOLI;
         CHROMOSOME ECOLI;
            GENE LACI; (* THIS IS ON PIECE LAC *)
            GET ALL GENE DIRECTION HOMOLOGUOUS;
            GET ALL GENE DIRECTION COMPLEMENT;
      If this Book were to become a Library, then a reference to PIECE LAC
 would be ambiguous since there are two PIECEs with that name within the
 CHROMOSOME.  The CATAL program detects these cases and makes the names differ
 by adding symbols to the names of second and subsequent duplicately named
 objects.  The second technical difference between Books and Libraries is that
 Books may have duplicate names, while Libraries may not.



      Notes For Transportation
      Unknown ends of objects (such as a GENE) are represented in this
 version by a number that is off the end of the coordinates of
 the PIECE.  For consistency, we have used +100000 or -100000 so
 that these can be more easily recognized (to our knowledge no
 continuous sequences are this long ... yet!).  If your computer
 cannot handle integers this large, then you can reduce these
 numbers, as long as they are outside of the individual coordinates.

(end of delman.construction.catal)

delman.construction.example


      AN EXAMPLE OF CONSTRUCTING DELILA LIBRARIES

      In this example we show the series of steps used to set up the Delila
 libraries provided on the tape.  The special bracket notation ([...])
 is used here to indicate the contents of a file.  A slash (/) inside
 the brackets indicates the beginning of a new line in the file.
 Other notation is described in DELMAN.DESCRIBE.CONVENTIONS.

 1. Generate Library Catalogues
      catal(humcat,[ADVANCE DATES],lib1,cat1,newlib1,lib2,cat2,newlib2)
      copy(newlib1,lib1)
      copy(newlib2,lib2)
 The humcat should be identical to or similar to the one we send.
 (Note:  l3 is empty, and c3 and newlib3 will not be written, but your
 computer may require that these files exist as empty files in order to
 run Catal.  A similar situation holds for Delila and many other programs.)

 2. Build Transcript Book
      delila(train,trabk,tradl,lib1,cat1,lib2,cat2)
 There will be warnings that can be ignored at this point.

 3. Build Transcript Library
      catal(trahu,[ADVANCE DATES],trabk,tract,trali)
 You will see a number of cases where duplicate names are resolved.

 4. Test Grin File
      delila(grin,grbk,grdl,trali,tract)
      comp(grbk,cmp,[3])
 cmp should show 140 ATG, 7 GTG, 2 TTG.

 5. Test Gain File
      Within the Gain file, the "FIRST", "LAST" and "SPECIAL" cases must be
 replaced by numbers.  The WORCHA program comes in handy here, because it will
 do this easily:
      worcha(gain,ga3in,[FIRST/0/LAST/2/SPECIAL/0])
      delila(ga3in,ga3bk,ga3dl,trali,tract)
      comp(ga3bk,cmp,[3])
 cmp should be the same as for Grin.

 6. Expanding Grin
      You can now expand the "FIRST" to "LAST" region of Gain, taking care not
 to violate the "SPECIAL" cases.

(end of delman.construction.example)

delman.construction.data.entry


         RULES OF RAW SEQUENCE INSERTION

 (1) A raw sequence is a file containing only the letters A, C, G or T
 (no U is allowed, use T).  You may type these letters or a set of
 letters on the keyboard that is convenient (eg. 1234); then convert
 the letters to ACGT using the program CHACHA.

 (2) For reasons of transportability and readability, the length
 of each sequence line should not exceed the width of characters on a
 typical terminal:  Do not type more than 60 bases per line.  You can reformat
 the data with REFORM or MAKEBK.

 (3) Sequences can and should be entered in free format with spaces
 to improve the readability of the sequence during entry.  This
 also helps in the corrections described below.  Much later it helps one to
 find parts of the sequence during fusion of PIECEs.

 (4) Before entry, use a pencil to mark off intervals of sequence to
 type.  This makes entry easier since there are rest points.  I often
 check off each (or every other) interval as I go, so I rarely get
 lost and duplicate or delete intervals.  If you can keep the lines like those
 in the paper, the sequence will be easier to check and correct later
 (but remember rule 2).

 (5) Two people should INDEPENDENTLY enter the sequence.
 Independence is important: one person will FREQUENTLY make the
 same mistake twice.  Do not be fooled into entry of a sequence and
 its complement by one person.  We have had two cases where the same deletion
 was entered in the same place by one person, even though he was typing
 the sequence and its complement.  Have two people independently
 type the sequence and the complement.  By doing it this way, you
 will also catch some typographical errors if you are using a published
 source.  (Another method:  if one person is to enter both strands, be
 sure that they are typed from two copies on which different intervals
 are used.)

 The method of independent entry allows automatic correction.  It seems
 to be faster and more reliable than other methods.

 (6) I caught the deletions mentioned above by knowing how long the
 sequence should be.  You should not rely on the computer for the
 length.  Predict it and then check it.

 (7) The file names of the two copies should include the
 initials of the person who typed the file.  See the example below.

 (8) A complemented or inverted strand may be re-complemented or
 re-inverted using the program REFORM.  Note that the free format
 of (3) will be lost.  You should use the reformatted sequence only
 for checking, and not for the final Library insertion, since you
 would lose the formatting if you did.

 (9) At this point you have two files of "raw" sequence.  The sequences
 may be merged together and corrected using MERGE.

 FOR EXAMPLE:  If the sequence was OMPA, TS and MA typed the raw
 copies, and the copy of MA contains the format desired for the
 Library, you could use MERGE like this:
      MERGE(OMPAMA,OMPATS,OMPA,GARBAGE)

 (10) Be sure to save all raw files (eg. OMPAMA, OMPATS, OMPA) until
 the library insertion is completed and taped or backed-up.


(end of delman.construction.data.entry)

delman.construction.library.design


      SEQUENCE INSERTION PROCEDURE

      The following procedure assures the accurate and complete insertion
 of sequences into a Delila Library.  Overview of the method:

                    REFERENCE OBTAINED
                           :
      .....................*....................
      :                    :                   :
      V                    V                   V
      :                    :                   :
 RAW SEQUENCE         RAW SEQUENCE       DESIGN BOOK
    COPY 1               COPY 2                :
      :                    :                   :
      V                    V                   :
      :                    :                   :
   CHACHA               CHACHA                 :
      :                    :                   :
      V                    V                   :
      :                    :                   :
      :.......MERGE........:                   :
                :                              :
                V                              :
                :                              :
           RAW SEQUENCE                        :
           CORRECTED COPY                      :
                :                              :
                V                              V
                :............MAKEBK............:
                               :
                               V
                               :
                    LIBRARY INSERTION MODULE
                               :
                               V
                               :
                        LIBRARY INSERTION

 I. Obtaining Sequences
      A. Sequences may be obtained from
         1) Publications and preprints
         2) Computer transfer
         3) Your lab

      B. One copy of the source article and the sequence (or two copies of
         the sequence when no paper is available) are to be made for entry to
         our reference shelf.  The photocopies must be of GOOD quality, with
         NO loss of information.

 II. Raw Sequence Insertion (See DELMAN.CONSTRUCTION.DATA.ENTRY for details)
      A. Double entry is preferred over other methods.
      B. Programs are available to make this easy: REFORM and MERGE.
         RAWBK may be used on the checked raw sequence to get results quickly.
      C. THE NAME OF THE GAME IS ACCURACY.

 III. Book Design
      A. First be sure that you understand library structure and coordinate
         systems.  See LIBDEF and DELMAN.USE.
      B. Use forms to write out inserted sections.  These can be found in the
         sections that begin with "DELMAN.CONSTRUCTION.FORM".
      C. Check the library to see if you can fuse the new sequence to
         previous sequence.
      D. Decide on a coordinate system or fuse to previously defined coordi-
         nates.  (NOTE: when there is no zero, add 1 to the negative numbers.)
         Write this information on the source copy for our reference shelf.
      E. Record the source of all fragments and special information (eg:
         no zero, negative numbers incremented) in the PIECE notes.
         Put a complete reference into the PIECE notes.  Include
         the positions on the coordinate system, such as: (-1288 to -208)
      F. Record all MARKERs, TRANSCRIPTs and GENEs in your coordinates.
         Unknown values are either +100000 or -100000, depending on which
         end of the coordinates the value is beyond.
      G. Create the Library insertion module using MAKEBK.  All MARKERs,
         TRANSCRIPTs and GENEs pointing to a PIECE must be placed immediately
         prior to the PIECE that they refer to.  They are called the "family"
         of the PIECE.  (Note: we call this piece of a Delila library a
         module, but this is not the same as the ones the Module program works
         with.  The meaning should be clear from the context.)

 IV. Insertion - With The Utmost Of Care
      A. Always insert whole Library insertion modules.  Replace old parts of
         the library by modifying a module and reinserting it (with an editor).
      B. Quickly check the book structure for blatant errors.

 V. Checking the new Library
      A. The catalogue program (CATAL) is used to check library structure
         and to generate human and librarian catalogues.
      B. Modules that contain only parts of books can be made into whole
         books by placing a shell around the module.  Example:  a PIECE and its
         family can be inserted into a shell of a fake ORGANISM and CHROMOSOME
         to check the PIECE structure.
      C. Correct modules are inserted into the library and CATAL is run on
         the entire library.  Be sure that file CATALP is empty, to ensure that
         the dates are advanced.
      D. End point checking: all coordinate numbers should be checked.
         To do this, use DELILA to pull out: COORDINATE, PIECE, GENE,
         TRANSCRIPT and MARKER endpoints.  This is painful, but it has caught
         many errors.  Example:
               GET FROM GENE BEGINNING TO GENE BEGINNING +2;
         should give mostly ATG, and a few XTG. (SOMEDAY THIS MAY BE AUTOMATED)

 VI. Listings Of The New Library
      These are often useful (program to use in parenthesis)
      A. LIB (SHIFT)
      B. HUMCAT (CATAL)
      C. REF (REFER)
      D. LIS (LISTER)  may be large.

(end of delman.construction.library.design)

delman.construction.form.organism


                   NAME:                     LIBDEF, 1980 JUNE 9



ORGANISM

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            GENETIC MAP UNITS (REAL)













(INSERT A SERIES OF
ORGANISMS AT THIS
POINT)








ORGANISM



(end of delman.construction.form.organism)

delman.construction.form.chromosome


                   NAME:                     LIBDEF, 1980 JUNE 9



CHROMOSOME

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            GENETIC MAP ENDING (REAL)











(INSERT A SERIES OF
MARKERS, GENES, TRANSCRIPTS,
AND PIECES AT THIS POINT)








CHROMOSOME



(end of delman.construction.form.chromosome)

delman.construction.form.marker


                   NAME:                      LIBDEF, 1980 JUNE 9



MARKER

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)



*                                            STATE (ON/OFF)

*                                            PHENOTYPE

DNA

*

*

DNA


MARKER



(end of delman.construction.form.marker)

delman.construction.form.transcript


                   NAME:                     LIBDEF, 1980 JUNE 9



TRANSCRIPT

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)

TRANSCRIPT



(end of delman.construction.form.transcript)

delman.construction.form.gene


                   NAME:                     LIBDEF, 1980 JUNE 9



GENE

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)

GENE



(end of delman.construction.form.gene)

delman.construction.form.piece


                   NAME:                     LIBDEF, 1980 JUNE 9



PIECE

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*                                            (NOTES INCLUDE PRECISE REFERENCE

*                                            FOR EVERY BASE IN THE PIECE)

*

*

NOTE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            COORDINATE CONFIGURATION
                                                (CIRCULAR/LINEAR)

*                                            COORDINATE DIRECTION (+/-)

*                                            COORDINATE BEGINNING (INTEGER)

*                                            COORDINATE ENDING (INTEGER)


*                                            PIECE CONFIGURATION
                                                (CIRCULAR/LINEAR)

*                                            PIECE DIRECTION (+/-)

*                                            PIECE BEGINNING (INTEGER)

*                                            PIECE ENDING (INTEGER)

DNA

* (INSERT SEQUENCE HERE)

DNA

PIECE

(end of delman.construction.form.piece)

delman.describe




















 DDDDDDD   EEEEEEEE   SSSSSS    CCCCCC   RRRRRRR   IIIIIIII  BBBBBBB   EEEEEEEE
 DD    DD  EE        SS    SS  CC    CC  RR    RR     II     BB    BB  EE
 DD    DD  EE        SS        CC        RR    RR     II     BB    BB  EE
 DD    DD  EEEE       SSSSSS   CC        RR    RR     II     BBBBBBB   EEEE
 DD    DD  EE              SS  CC        RR    RR     II     BB    BB  EE
 DD    DD  EE              SS  CC        RRRRRRR      II     BB    BB  EE
 DD    DD  EE              SS  CC        RR  RR       II     BB    BB  EE
 DD    DD  EE        SS    SS  CC    CC  RR   RR      II     BB    BB  EE
 DDDDDDD   EEEEEEEE   SSSSSS    CCCCCC   RR    RR  IIIIIIII  BBBBBBB   EEEEEEEE
























(end of delman.describe)

delman.describe.conventions.naming-parameters


      PROGRAM NAMING CONVENTIONS

      Every Delila System program exists in several forms:

 1) Raw source code - without modules inserted.  Example: "lister.r"
 would be the raw code for the LISTER program.  We are not sending code
 this way.

 2) Pascal source code - with all modules inserted.  This code is ready
 to compile.  Example: "lister.p".  (Our previous convention was to add
 an s to the end of the file name to indicate this.)

 3) Compiled code.  Our convention is to remove the suffix: "lister".
 To simplify the manual, programs are listed under the compiled code
 name (lister).


      PARAMETER FILE NAMES
      A file that controls the operation of a program is called a parameter
 file.  For LISTER this file is LISTERP.  For SPLIT it is ...
 SPLITP (get it? HA! HA! sorry.)

      RULES FOR PARAMETER FILES
 1) If the file is not empty then the file must contain values for all
 parameters.  With few exceptions, this should reduce the number of complex
 rules that one must deal with.

 2) Each parameter is on its own line.

 3) Parameters are left justified on the line.

 4) A parameter may be followed by one or more spaces and then any
 comment.  This lets the user write reminders of what the allowed
 values are.

      WHY CAN'T DEFAULT PARAMETER VALUES BE STATED IN THIS MANUAL?

 1) If default values are changed, then the manual must also be changed.
 since there is no automatic mechanism to assure that these remain
 the same, it is likely that it will be forgotten.  The manual would
 then be out of date.

 2) The manual entry defines the program but does not enforce details
 of operation.  It is somewhat like the LIBDEF specification.

 3) It is easy to find out what the defaults are since almost every
 program states the values used in its listing.  Running a small test
 takes only two minutes.

(end of delman.describe.conventions.naming-parameters)

delman.describe.conventions.writing


      PROGRAM WRITING CONVENTIONS

      Program source code will always follow certain rules:

 1) The first line(s) will be the Pascal PROGRAM statement.

 2) The module libraries that are sources of the modules will be stated.

 3) One of the global constants will be called VERSION.  This number
 or string identifies the particular version of the source code.  We
 change VERSION every time that we modify the source file.  The program
 name and VERSION are written to the OUTPUT file when the program runs.

 4) There will be a document module that describes the program.
 The module is identical to the one in this manual such as
      DESCRIBE.LISTER
 It follows the format defined in
      DELMAN.DESCRIBE.DOCUMENTATION.PROGRAMS

 5) All constants, types, variables, procedures, functions
 and sections of code will have comments that describe their function.

 6) Interactive programs always have a HELP command.


      FOR TRANSPORTATION:
 1) Put non-standard features inside modules.

 2) Program lines longer than 80 characters are avoided.  (NB: This is ALWAYS
 possible in PASCAL).  The FLAG program will detect any lines that are too long.

 3) Reading into packed arrays is forbidden.  Read into unpacked arrays
 and pack or transfer values.

 4) The Pascal Users Manual suggests that PASCAL identifiers "must
 differ over their first 8 characters."  There are two problems related
 to this.  Assume that the transport is from a computer that requires
 N characters to differ, where N > 8 (eg. 10).
   a) Transport to a computer that requires M < N may cause names like A23456789
   and A2345678X to be considered identical, and compilation will be prevented.
   b) Transport to a computer that recognizes M > N will detect cases
   where one name was written two ways, with the difference in the last
   characters (between N and M).  The "most famous" such case was
   in CATAL: HUMCATLINE and HUMCATLINES were used on a computer where
   N = 10 and failed on computers where M > 10.
 The solution in both cases is to avoid names that differ beyond
 8 characters.  Is somebody willing to write a program to detect this?

(end of delman.describe.conventions.writing)

delman.describe.conventions.running


      PROGRAM RUNNING CONVENTIONS

 In this manual we will use a single notation to mean running a program:
      lister(book,list)
 means to run the program LISTER using a file named BOOK.  The program
 will produce output to file LIST.

 The names BOOK and LIST are not necessarily the same as the file names
 declared in the source of LISTER (LISTERS), we assume that the names
 are mapped one on one.  Also, file names to the right may not be
 always mentioned, to simplify the notation.  For example:
      edit(inst1)
         :
         :   (create Delila instructions in file INST1)
         :
      delila(inst1,book1,delist1)
             (run DELILA to create a book named BOOK1 and
              a Delila listing DELIST1 that shows where the errors are.
              the library and catalogue are not mentioned.)
      lister(book1,list1)
             (Run the auxiliary program LISTER.
              OUTPUT and LISTERP are not mentioned.)

 The file OUTPUT will always contain messages and diagnostics intended
 for the CRT screen or teletype.
 The file INPUT is always used for interactive input by the programs.


 To fully define the files that a program uses we will write:
      LISTER(BOOK: IN; LIST: OUT; LISTERP: IN; OUTPUT: OUT)
 IN and OUT define the direction of information flow into or out of
 the program.  INOUT would mean that the source file may be modified
 (such as by an editor).  This is a symbolic way to represent the data
 flow diagrammed in our papers (see DELMAN.INTRO.DESCRIPTION).


 NOTE: The mapping of logical file name (the one the program knows) to
 physical file name (the actual one the computer system uses) is
 frequently done with an ASSIGN or LINK command in the job control language of
 the computer.

(end of delman.describe.conventions.running)

delman.describe.short.cluster.files


      Short clustered descriptions of some Delila System files

 DOCUMENTS
      AAA     Names Of Delila System Files
      chars   Character List
      delman1 Delila System Manual
      delman2 Delila System Manual, for program descriptions
      libdef  Delila Library System Definition
      moddef  Module Transfer System Definition

 LIBRARIES
      humcat  Human's Catalogue For The Library
      lib1    Library 1: Bacteriophage
      lib2    Library 2: E. Coli And S. Typhimurium

 DELILA INSTRUCTIONS
      train   Transcript Library Instructions
      grin    Gene Starts In Relative Form (Use Transcript Library)
      gain    Gene Starts In Absolute Form (Use Transcript Library)

 SEARCH PROGRAM RULES
      genrule Finds Genes And Non-Genes
      enzrule Finds Restriction Enzyme Sites In Books

 WEIGHT MATRICES FOR THE PERCEPTRON
      w101    101 Wide, Finds All Genes In Transcript Library
      w71     71 Wide, Finds All Genes In Transcript Library
      w51     51 Wide, Finds All Genes And Some Nongenes

 EXAMPLES
      ex0bk   Example Book
      ex0hu   Example Catalogue For Humans
      ex0dl   Example Delila Listing
      ex0in   Example Instructions - To Create EX0BK
      ex0li   Example Listing From LISTER
      ex0lo   Example Loocat On Catalogue from EX0BK

 EXAMPLE DELILA INSTRUCTIONS FOR DELMAN
      ex0in   "ex0: example"
      ex1in   "ex1: the laci gene"
      ex2in   "ex2: an absolute get"
      ex3in   "ex3: a relative get"
      ex4in   "ex4: non-coding lac leader"
      ex5in   "ex5: the region between laci and lacz"
      ex6in   "ex6: multiple specification and requests"
      ex7in   "ex7: aligned book"
      ex8in   "ex8: non-coding lac leader- via respecification"

 EXAMPLES FOR TESTING THE MODULE PROGRAM
      exsin   example source in
      exmodli example modlue library

 EXAMPLES FOR TESTING AUXILIARY PROGRAMS
      expepin Delila Instructions For Testing Pemowe

 EXAMPLES FOR TESTING THE PERCEPTRON
      exspbk  Example Sequences Positive Book
      exsnbk  Example Sequences Negative Book
      expa0   Example Pattern 0, Learn EXSPBK Vs EXSNBK With Zero Start
      expa1   Example Pattern 1, An Initial Matrix For Learning
      expa2   Example Pattern 2, Learn EXSPBK Vs EXSNBK Using EXPA1 As Start
      expan2  Result Of Patana On EXPA2
      exsebk  A Book For Searching With EXPA2

 EXAMPLES FOR TESTING ENCODE PROGRAMS
      exencin Example Encode Instructions
      exencbk The Book For EXENCIN
      exencen Example Encoding Of EXENCBK

 FONTS FOR BIGLET
      font    font for the biglet program
      phont   demonstration font for the biglet program

 EXAMPLE PARAMETER FILES
      Often a program will have a file associated with it
 that controls it and is called a parameter file.  For example, the
 pbreak program uses a parameter file called pbreakp.  Many programs
 have example files.  They are not listed here, but you may want
 to look for them before you run the program.  An example is the xyplo
 program, for which there are the files xyplop.demo, xyin.demo,
 xyplop.test and xyin.test.
      As programs are modified, this section will not always be up to date.

(end of delman.describe.short.cluster.files)

delman.describe.short.cluster.programs


      Short clustered descriptions of Delila System programs
      Documentation exists as describe.[name]

 MODULE LIBRARIES
   auxmod: modules for auxiliary programs
   delmod: delila module library
   doodle: pascal graphics library and preprocessor for pic under unix
   cybmod: specific module library for the cyber computer
   genmod: genbank access modules
   matmod: mathematics modules
   prgmod: programming modules for the delila system
   unixmod: specific module library for the unix operating system
   vaxmod: specific module library for the vax computer

 MODULE MANIPULATION
   module: module replacement program
   makemod: create a set of empty modules from a list of names
   makman: make manual entries from a source code
   maknam: make manual entry names
   modin: generate modularized delila instructions for absolute sites
   modlen: determine module lengths
   makemod: create a set of empty modules from a list of names
   nulldate:  modules to neutralize the date-time functions
   pbreak: breaks a file into pages at a certain trigger phrase
   show: show modules in a module library
   undel: remove references to delman in modules

 TOOLS
   biglet: text enlargement program
   calc: a calculator that propagates errors
   calico: character and line counts of a file
   cap: put capital letters inside quotes of a program
   censor: removes code from a program
   chacha: changes characters in a file
   code: find the comment density of a pascal program
   column: pull defined column from input
   concat: concatenate files together
   copy: copy one file to another file
   decat: break a file into 10 files
   decom: remove comment starts from within a comment
   difint: differences between integers
   flag: points out excessively long lines
   ll: line lengths
   lig: ligation theory
   lochas: look at characters in a file
   merge: compare two files and merge them
   nocom: remove comments
   number: add line numbers to a file
   rembla: remove blanks from ends of lines in a file
   repro:  make multiple copies of a file
   same: counts the number of lines that are identical in two files
   shell: basic outline for a program
   shift: copy one file to another file, with a blank in front of each line
   short: find locations of short lines in a file
   shortline: make short lines out of long lines
   split: split a wide file into printable pages
   sqz: squeeze the input file to fit into fewer characters per line
   sumfile: sum of file sizes
   test: a simple test program for Pascal
   unshi: remove first column of characters from a file
   ver: look at the version of a program
   verbop: increment the version number of a program
   vernum: print the version number of a program
   versave: save the file under the version number
   unsqz: unsqueeze the input file
   whatch: what characters are in a file?
   worcha: word changing program
   wl: wrap lines in a file
   woco: word counting program
   wordlist: lists words in a file
   ww: word wrap

 TOOLS FOR TEX
   notex: remove tex and latex constructs
   ref2bib: refer to bibtex converter
   sortbibtex: sort a bibtex database
   untex: remove tex and latex constructs
   untitle: remove titles from bbl file
   unverb: remove verbatim sections from a latex file

 GRAPHICS
   doodle: pascal graphics library and preprocessor for pic under unix
   domod: doodle modules
   dops: pascal graphics library and preprocessor for postscript
   dosun: pascal graphics library and preprocessor for Sun graphics
   shrink: reduce size of postscript graphics

   genhis: general histogram plotter
   genpic: convert genhis output to pic input

   xyplo: plot x, y data
   log: convert columns of data to log

   dnag: graphics of dna

 LIBRARIAN
   delila: the librarian for sequence manipulation
   catal: cataloguer of delila libraries, the catalogue program
   loocat: look at a catalogue

 GENBANK
   dbbk: database to delila book conversion program
   dbcat: database catalog production and sorting program.
   dbfilter: filter GenBank databases to remove unwanted entries
   dbinst: extract Delila instructions from a GenBank database
   dblo: look at the catalogue of a genbank/embl database
   dbpull: database extraction program.

 AUXILIARY PROGRAMS FOR DATA BASE CONSTRUCTION
   makebk: make a book from a file of sequences.
   rawbk: make a raw sequence into a book
   reform: raw sequences reformatted

 AUXILIARY PROGRAMS FOR SEQUENCE LISTING
   lister: list the sequences of pieces in a book with translation
   parse: breaks a book into its components

 AUXILIARY PROGRAMS FOR ALIGNED SEQUENCES
   alist: aligned listing of a book
   gap: gaps in aligned listing of a book
   hist: make a histogram of aligned sequences.
   histan: histogram analysis.
   malign: optimal alignment of a book, based on minimum uncertainty

 AUXILIARY PROGRAMS FOR ANALYSIS
   cluster: cluster indana subindexes into groups of duplicate entries
   coda: composition file to data for genhis
   comp: determine the composition of a book.
   compan: composition analysis.
   count: counts the amount of sequence in a book
   frame: evaluator of potential reading frames
   indana: analysis of an index
   index: make an alphabetic list of oligonucleotides in a book
   pemowe: peptide molecular weights
   search: search a book for strings

 AUXILIARY PROGRAMS FOR HELIXES
   dotmat: dot matrices of two books
   helix: find helices between sequences in two books
   keymat: keyed-matrices for helices between two books
   matrix: dot matrices for helices between two books
   rep: records repeats between sequences in two books

   sorth: sort helix list
   instal: delila instruction alignment

 AUXILIARY PROGRAMS FOR PATTERN LEARNING
   patana: pattern analysis
   patlrn: pattern learning
   patlst: lister of patlrn output.
   patser: pattern searcher
   patval: pattern evaluations of aligned sequences

 AUXILIARY PROGRAMS FOR ENCODED SEQUENCES
   encfrq: encoded sequence frequency analysis
   encode: encodes a book of sequences into strings of integers
   encsum: sum of the vectors of encoded sequences

  AUXILIARY PROGRAMS FOR INFORMATION ANALYSIS
   calhnb: calculate e(hnb), var(hnb), ae(hnb), avar(hnb), e(n)
   frese: frequency table to sequ
   palinf: find palindromes, based on information theory
   rf: calculate Rfrequency
   rseq: rsequence calculated from encoded sequences
   rsim: Rsequence simulation
   rsgra: rsequence graph
   dalvec: converts Rseq rsdata file to symvec format
   makelogo: make a graphical `sequence logo' for aligned sequences
   ckhelix: check that the helix location is where one wants
   alpro: frequency and information of aligned protein sequences
   alword: frequency and information of aligned words
   dirty: calculate probabilities for dirty DNA synthesis
   sites: analyse sites from randomized sequence data base
   bkdb: convert a book to database format for the sites program
   siva: site information variance
   diana: diaucleotide analysis of an aligned book
   tri: test environment for triangle array
   digrab: diagonal grabs of diana data
   da3d: diana da file to 3d graphics
   dotsba: dots to database
   Ri: Rindividual is calculated for every site in the aligned book
   scan: scan a book with a wmatrix and generate a vector
   vfilt: vector filter
   tod: to database format for sites program
   winfo: window information curve

 AUXILIARY PROGRAMS FOR OTHER USES
   refer: print the references in the pieces of a book
   sepa: separates delila instruction sets
   lenin: convert a list of lengths into Delila instructions

 RANDOM NUMBERS AND SEQUENCES
   markov: markov chain generation of a dna sequence from composition.
   tstrnd: test random generator
   gentst: test random generator
   normal: generate normally distributed random numbers
   rndseq: generate random dna sequences
   aran: aligned random sequences

 MATHEMATICS
   av: average integers
   binomial: produce the binomial probabilities for a found black to white ratio
   binplo: produce the binomial probabilities for a found black to white ratio
   cerf: complement of the error function
   cisq: circle to square
   chi: estimates chi squared from degrees of freedom
   linreg: linear regression
   mnomial: produce the multinomial distribution for base probabilities
   pcs: partial chi squared
   riden: ring density graph
   ring: z space ring
   sphere: plot density of shannon spheres
   stirling: test of stirling's formula
   zipf: Monte Carlo simulation for Peter Shenkin's problem

 MISCELLANEOUS
   aa: not actually a program, this is the header page for Delila manual
   asciicode: converts ascii table to Pascal code
   binhex: convert binary to hex
   hexbin: convert hex to binary
   mstrip: remove control m's from a file
   epsclean: clean an eps file

   kenin: create Delila instructions from Kenn's all.gen instructions
   kenbk: book from a file of sequences of sequences provided by Kenn Rudd

   tipper: copy a file to the output file with special symbols at end
   todawg: change a book into dawg format

   ev: evolution of binding sites
   evd: evolution display

   makedate: make a date file
   makessbdate: make a date file from a Sample_Sheet.bin file

 PROGRAMS TO CONTROL MACHINERY
   odti: munch od and time plates together for xyplo
   titer: analyse titertek optical density data
   spec: analyse two spectra from the camspec
   ssbread: read a sample sheet from the ABI sequencer
   tkod: read od values from tk data

(end of delman.describe.short.cluster.programs)

U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  USA.gov  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA