Delila Program: mkdb

mkdb program

Documentation for the mkdb program is below, with links to related programs in the "see also" section.

{   version = 1.27; (* of mkdb.p 2011 Feb 04}

(* begin module describe.mkdb *)
(*
name
   mkdb: read sequence; make GenBank entry with features for capitalized regions

synopsis
   mkdb(sequ: in, mkdbp: in, entries: out, output: out)

files
   sequ:  raw DNA sequences in lower case except for objects
      of interest marked in upper case.  The program also accepts 'n'.
      Sequences are separated by periods.
      Each sequence may be preceeded by a name line and a species line.
      These lines can begin with '>' or '*', but this is not necessary.
      Spaces, blank lines and numbers are ignored.
      Other lines that begin with '>' or '*' are comments
      If there is no species name, the single name will be used
      for the species also.  This kludge allows the program to read
      the fasta format.  If there is no name at all (just sequence)
      then a name will be assigned: nameste.

      name: a string of characters to name the sequence.
      organism: the species this sequence represents

   entries:  GenBank entries for the sequences, with features for
      the capitalized regions marked as exons and features for the lower case
      regions (not including primers) as introns.

   mkdbp:  parameters to control the program.  The file must contain the
      following parameters, one per line:

      parameterversion: The version number of the program.  This allows the
         user to be warned if an old parameter file is used.

      exoncutoff (integer): Capitalized regions longer than this
         number of bases will be called exons, the others will be
         called primers.

      multipart (character): What to do if a name has spaces in it.
         'i' ignore the rest of the name
         'u' replace spaces with underscores

   output: messages to the user

description

   Sequences are often marked by people with capital letters to
   indicate interesting regions (exons, primers, mutations, etc).
   This program uses raw sequences to create simple flat-file GenBank
   style entries with features marked by capital letters.  Long
   features are called 'exons' while short ones are called 'primers'.
   The division between these two is given by the exoncutoff
   parameter.

Example

If the sequ contains:

* T7stuff
* Bacteriophage T7
aacataaaggacacaatgcaatgaacattaccgacatcatgaacgctatc
gacgcaatcaaagcactgccaatctgtgaacttgacaagcgtcaaggtat
gcttatcgacttactggtcgagatggtcaacagcgagacgtgtgatggcg
agctaacCGAACTAAATCAGGCACttgagcatcaagattggtggactacc
ttgaagtgtctcacggctgacgcagggttcaagATGCTCGGTAATGGTCA
CTTCTCGGCTGCTTATAGTCACCCGCTGCTACCTAACAGAGTGATTAAGG
TGGGCTTTAAGAAAGAGGATTCAGGCGCAGCCTATACCGCATTCTgccgc
atgtatcagggtcgtcctggtatccctaacgtctacgatgtacagcgcca
cgctggatgctatacggtggtacttgacgcacttaaggattgcgagcgtt
tcaacaatgatgccCATTATAAATACGCTGAgattgcaagcgacatcatt
gattgcaattcggatgagcatgatgagttaactggatgggatggtgagtt
tgttgaaacttgtaaactaatccgcaagttctttgagggcatcgcctcat
.

The entries file will contain:

LOCUS T7stuff 600 bp  DNA * mkdb 1.21
DEFINITION  Bacteriophage T7
ACCESSION   T7stuff
VERSION     T7stuff.1
SOURCE   Bacteriophage T7
  ORGANISM   Bacteriophage T7
FEATURES
     primer 158..174
     exon   234..345
     primer 465..481
BASE COUNT     166 a     133 c     151 g     150 t       0 n
ORIGIN
         1 aacataaagg acacaatgca atgaacatta ccgacatcat gaacgctatc gacgcaatca
        61 aagcactgcc aatctgtgaa cttgacaagc gtcaaggtat gcttatcgac ttactggtcg
       121 agatggtcaa cagcgagacg tgtgatggcg agctaaccga actaaatcag gcacttgagc
       181 atcaagattg gtggactacc ttgaagtgtc tcacggctga cgcagggttc aagatgctcg
       241 gtaatggtca cttctcggct gcttatagtc acccgctgct acctaacaga gtgattaagg
       301 tgggctttaa gaaagaggat tcaggcgcag cctataccgc attctgccgc atgtatcagg
       361 gtcgtcctgg tatccctaac gtctacgatg tacagcgcca cgctggatgc tatacggtgg
       421 tacttgacgc acttaaggat tgcgagcgtt tcaacaatga tgcccattat aaatacgctg
       481 agattgcaag cgacatcatt gattgcaatt cggatgagca tgatgagtta actggatggg
       541 atggtgagtt tgttgaaact tgtaaactaa tccgcaagtt ctttgagggc atcgcctcat
//

documentation

see also

   example parameter file: mkdbp
   example sequ file: mkdb.sequ move to the name 'sequ' to use it

   Program for listing the sequences: lister.p
   Program for generating search for capitalized sequence: capsmark.p

author
   Thomas Dana Schneider

bugs

technical notes

   Capitalization that abuts either end of the sequence will be
   indicated in the entry as beyond the end.  This way the ends of the
   sequence will not be marked as donors or acceptors.

   The maximum name and sequence lengths are constants maxobjectlength
   and maxsequencelength respectively.

*)
(* end module describe.mkdb *)
{no "version =" string found}
(* begin module describe.const *)
   maxnamelength = 100; (* maximum length name *)
   maxsequencelength = 6000000  ; (* maximum sequence length *)
   (*                  253256583 human chromosome 2 length *)
   (* Set the length to the maximum your computer can handle *)
   debugging = false; (* set true to get debugging output *)
(* end module describe.const *)
{This manual page was created by makman 1.44}
{created by htmlink 1.55}
National Cancer Institute    National Institutes of Health    Health and Human Services    USA Gov - Official Web Portal    Viewing Files    Accessibility