Automated Mapping between RNA Sequence and Structure
Senior Project: 1997-1998
The workings of a living cell are being deciphered in great detail, due in
large part to new sequencing technology. It is now possible to determine the
nucleotide sequence for an organism's entire genetic blueprint, stored in the
organism's chromosomes, also called its genome. The genomes in smaller
bacteria contain 500,000 nucleotides, while genomes are considerably larger in
multicellular organisms.
Researchers in the Department of Chemistry seek to understand the biological
meaning for this large amount of nucleotide sequence information. The
structure and evolution for a few select RNA molecules are being studied. In
this work, thousands of sequences are aligned and analyzed. A better
understanding of these "structural building blocks" will greatly increase the
understanding of RNA structure and conformation, and how a sequence (primary
structure) folds up into its biologically functional secondary and tertiary
structure.
To date, most of the mappings between sequence and higher-order structure have
been identified by visual inspection of the sequence and structure data. This
manual method can at best only determine the obvious and strongest
sequence/structure relationships. Software was developed approximately ten
years ago with the goal of finding more of these relationships. Inherent in
this program is an "RNA structure language", in which a user writes a
descriptor to describe a specific structural motif. The program then searches
a sequence database for cases where a part of a sequence can form the structure
defined in the user-specified descriptor.
The RNA structure language and the software both had limitations. The goal of
the project was to develop an improved descriptor language, able to describe
more complex structural motifs, along with software to implement the
corresponding search of the databases. The project was implemented in C++ with
an object-oriented approach and runs in a UNIX environment.

|