Delila Program: embed

embed program

Documentation for the embed program is below, with links to related programs in the "see also" section.

{   version = 1.33; (* of embed.p 2012 Jun 20}

(* begin module describe.embed *)
(*
name
   embed: embed an aligned set of DNA sequences into random sequences

synopsis
   embed(inst: in, book: in, mkvseqs, in: ranbook, in: embedp: in,
         embedbk: out, output: out)

files
   inst:  delila instructions of the form 'get from 56 -5 to 56 +10;'

   book:  the book generated by delila using inst

   mkvseqs: random sequence output from the markov program

   ranbook: book made from random sequences using makebk program; either
            mkvseqs or ranbook must be contain sequence.  If both contain
            sequence, then mkvseqs will be used as the source for random
            sequences.

   embedp:  parameters to control the program.  The file must contain the
            following parameters, one per line:

      parameterversion:  The version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      alignmenttype:  The type of alignment to use. f: first base, i: inst,
         b: book alignment

         'b' is to be used when 'default coordinate zero;' is used in the
         inst file, resulting in a book whose coordinates do not match the
         inst coordinates. 'i' is to be used when the book contains a normal
         coordinate system corresponding to the inst file. 'f' simply aligns
         by the first base in the book.  See alist.p for more details on
         alignmenttype.

      InFrom, InTo: the from-to range of the input sequences to be used.

      OutFrom, OutTo: the from-to range of the sequences to output.
         This includes the Infrom range AND the random sequences.

   embedbk: book created by the program. Contains the sequences embedded
            within random sequences to the specified range.

   output: messages to the user

description

   Embed embeds a given set of aligned sequences into random sequences
   having a specified range.  If there is an incomplete sequence in
   the region to be embedded, it is filled in with random sequences as
   well.

   This allows one to destroy a pattern in the aligned sequences, so
   that the sequences can be realigned to find other patterns nearby.

   The parameters OutFrom, InFrom, InTo, OutTo in embedp set the range
   to do the embedding.  In order for the program to function
   correctly, the following must be true: OutFrom <= InFrom <= InTo <=
   InFrom.  The sequence from InFrom to InTo is not changed, and
   random sequence is filled in around it from OutFrom on the left to
   OutTo on the right.  See example below.

   If the orginal sequence is longer than the range OutFrom to OutTo
   then the book will contain the embedded sequence with orginal
   sequence on either side of the random sequence.

   The program stores the random sequence as a string and then uses it
   base by base until there is no more in the string. Then it reads
   another string of random sequence.  In this way, none of the random
   sequence is "thrown away".

   If the program finds the end of mkvseqs or ranbook before it has
   embedded all the sequences, it gives a message that it is out of
   random sequence and halts.  Why doesn't the program reuse the
   random sequence?  This is not a good idea because the embedded
   sequences are designed to be fed into malign, and malign would pick
   up on this reused sequence and find unnatural sequence
   conservation.

   Aligned sequences can be viewed with the alist program.

   The random sequences are generated by the markov program.  They can
   be read from either mkvseqs or ranbook.  mkvseqs is directly
   generated from markov to a given composition and length.  Ranbook
   can be made using the makebk program.  If both files are present,
   mkvseqs is used.

   The output of this program is designed to be fed into the malign
   program for multiple alignment.

examples

   With the following parameters from embedp the sequence would be embedded
   as shown below.

   -10 10  InFrom, InTo: range of input sequences to be used
   -30 30  OutFrom, OutTo: range of the sequences to output

   original:
    -----|-------------------<---------0--------->-------------------|-----
        -30                 -10                 +10                 +30
       OutFrom              InFrom              InTo               OutTo

   embedded:
         ********************<---------0--------->********************
        -30     random      -10     original    +10     random      +30
                sequence            sequence            sequence

   Note that if there is any sequence in the original alignment outside
   the range OutFrom to OutTo, it will be copied to the embedbk.

   Randomizing a Single Patch

   Using embed it is possible to cover only one small area with random
   sequence instead of two areas.  To do this you will need to use the
   embed parameters in a certain way.

   For example if you wanted to cover only the zero coordinate with
   random sequence, three of the parameters will need to be the same:

   -1 -1  InFrom, InTo: range of input sequences to be used
   -1  0  OutFrom, OutTo: range of the sequences to output

   When parameters are the same, the InFrom and InTo override the
   OutFrom and OutTo.  The example parameters given above would keep
   the sequences at the -1 coordinate the same, but make the sequences
   at the 0 coordinate random.  In this case all sequences other than
   0 are kept the same.

   Another example would be to 'zap' or randomize from -3 to +4.  The
   parameters would be:

   -4 -4 InFrom, InTo: range of input sequences to be used
   -4  4 OutFrom, OutTo: range of the sequences to output

   These parameters would leave sequences from below up to the -4
   coordinate alone, but make the sequences from -3 to +4 random.  The
   sequences from +5 and higher would be maintained as well.

documentation

see also

   alist.p, markov.p, makebk.p, malign.p

author

   Elaine Bucheimer

bugs

   The program cannot handle sequences longer than dnamax.  This is a
   fixable bug.

   A possible future addition to the program would be to allow the
   user to specify if they want the old sequence hanging around or if
   the sequence should be chopped outside the OutFrom and OutTo
   coordinates.

   It appears that the 'i' option does not embed correctly.  The
   resulting book does not have the advertised coordinates.  A
   temporary solution is to use the f option with appropriate ranges.

technical notes

*)
(* end module describe.embed *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility