How to build Delila Libraries

The Delila system of programs allows one to manipulate DNA sequences. The unique feature of this system is that the sequences always have a coordinate system attached to them. As a result various output files can be compared directly. One builds a Delila database and then extracts the pieces of DNA needed for an analysis. By using a set of programs to do this automatically, one gains a lot of power and control.

This page describes how to set up Delila databases. There are two steps to this process. First one must get the sequences from their original format into Delila format. Then one must build the Delila library.

Step 1: Converting Sequences to Delila Format

See each program for details.

rawbk is designed to take a single raw sequence and convert it to a Delila book.

makebk takes many sequences and creates a Delila book containing all of them. The sequences end with periods. Makebk asks the user for details of the names and coordinate systems for each piece. Alternatively, the user can have the book created automatically with 1 to n numbering and pieces named p1, p2, ... pn. It is convenient to avoid user input by creating a makebkp file. See the program for further details.

dbbk will take a large set of GenBank entries and convert all of them to a single Delila book. To use it, first concatenate the GenBank entries you are interested in into a single file called db. Be sure each entry ends with "//" if some programmer removed it. Then run the dbbk program to create the l1 file, which is in book format. (Note: l1 is a lower case L followed by the numeral 1. Some fonts do not distinguish these!) A convenient way to obtain GenBank entries is to use the wgetac script; see the page on wget for further information.

mkdb will take a file in Fasta format and created a GenBank flatfile. Then one can use dbbk. The nice feature of mkdb is that one can mark sections of the sequence as 'exons' and 'primers' and get corresponding features in the GenBank format.

Step 2: Building a Delila Library

Delila libraries are created by the catal program, which can use up to three input books named l1, l2 and l3. The dbbk program creates an l1 automatically, so if you use dbbk you are ready to go. If you have a book then you can copy the book to the l1 file or move it:

cp book l1
This will make a copy of book in l1 and leave the book as is.
mv book l1
This will move the book to l1, there will no longer be a file named 'book'.

The next step is to make three empty files. Under Unix use touch:

touch l2
touch l3
touch catalp

Then run the catal program. If all goes well (it should) you will have 6 new files:

lib1 lib2 lib3 cat1 cat2 cat2

only lib1 and cat1 will contain data. (The catal program will also create some output files telling you about whether the database is ok.) You can now delete l1 if you want to save space. Keep db around for other purposes, such as automatically generating delila instructions with dbinst or exon.

Now here's the reason for delila: you can write instructions for grabbing exactly the sequences you want. This will make a small subset of the database (thus saving space during analyses). Since the database is called a library and the librarian follows your instructions, the result is called a 'book'. There is a short tutorial on how to write delila instructions:

Delila instruction tutorial

Once one has the subset one can make a sequence logo. For binding site analyses, the request format that you should use is:

get from 63 -25 to same +7 direction +;
See the page: Logo Programs for the programs you will need to set up to create logos.

Finding Sequences on a Genome

The best way to use the Delila system is to extract all of your sequences from a complete genome. In many cases you may already have some sequences, but you don't know where they are on the genome yet. Although it may seem simplest convert the sequences into a book and to proceed from there, this has strong disadvantages. It would require that each sequence have its own name in the book, and this makes later work complex. Also, the description of the dataset for publication is messy. It is better to extract the fragments from the genome.

One way to locate sequences is to use the search program. I plan to make a page to describe how to do this.

An easier way to locate the sequences on the genome is to use the sebo program. This program will generate Delila instructions that you can use to extract the fragments. Once you have the instructions, you can refine them to create exactly the dataset you need.

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab

origin: 2000 February 17
updated: 2000 Nov 8
color bar
U.S. Department of Health and Human Services  |  National Institutes of Health  |  National Cancer Institute  |  | 
Policies  |  Viewing Files  |  Accessibility  |  FOIA