Recent news items :

  1. There is a sample final exam.

    Posted: Tuesday, December 10, 2002

 

Various topics

Tuesday, November 26


Previous page


Hash tables

These allow fast lookups without even sorting the data. The idea is to compute an array subscript as a function of the key that we are storing or looking up. This is explained in the textbook in Sections 12.2 and 12.3.

Importantly, these tables work well only when sparsely populated.

In “open-address” hashing (Section 12.2) collisions of keys (more than one key wanting to be stored in the same slot) are handled my using alternate slots. In “chained” (also called “closed”) hashing, each array element is some kind of “bucket” that can hold any number of elements.

If the hash functions are good and the table is sparse, this is better than the O(log n) performance of trees or skiplists: the insertions and deletions can be done in “O(1),” i.e., constant, time.

One way to try to reduce clustering in open hashing is to replace “linear probing”, where an element that is denied a slot tries the next slot down, with “double hashing”, where a second hash function is used to determine how far down in the array the element looks for an open slot: for example, if the first hash function returns a 15 and the second one a 7, then the sequence of slots in which the element tries to find an open one is 15, 22, 29, ... .

Trees stored in arrays, tree traversals, stacks vs queues

There are two natural ways to store trees in arrays. Heaps have their root stored in the leftmost array location, search trees have their root stored in the middle of the array (the exact middle if the search tree was of a perfect shape).

These two methods correspond to two kinds of “traversal” of the trees: breadth-first vs depth-first, which in turn correspond to two different ways to remeber which nodes need yet to be visited: queues vs stacks.

Iterators

Iterators provide a way to loop through the elements of a container class without needing to know how they are stored. They can be implemented as objects that contain a single pointer.

Here is an example of an iterator for the CatalogBrowser of Assignment 3:

[I edited this to have a correct postfix ++ operator. The printed handout erroneously had the postfix and prefix ++ operators programmed to act the same.]

 // File: CatalogIterator.h 
 
 #ifndef CATALOGITERATOR
 #define CATALOGITERATOR
 
 #include "Catalog.h"
 #include "CatalogNode.h"
 #include "CatalogEntry.h"
 
 class CatalogIterator 
 {
     public: 
 
         CatalogIterator (const CatalogIterator& catI);
         CatalogIterator (CatalogNode* p = NULL);
 
         CatalogEntry& operator * () const;
         CatalogEntry* operator -> () const;
         CatalogIterator& operator ++ (); // prefix
         CatalogIterator operator ++ (int); // postfix
 
         friend bool operator == (
             const CatalogIterator& one, 
             const CatalogIterator& two);
 
         friend bool operator != (
             const CatalogIterator& one, 
             const CatalogIterator& two);
 
     private:
 
         CatalogNode* current;
 };
 
 #endif

 // File: CatalogIterator.cxx
 
 #include "CatalogIterator.h"
 #include "CatalogNode.h"
 #include "CatalogEntry.h"
 
 CatalogIterator::CatalogIterator (const CatalogIterator& catI)
 {
     current = catI.current;
 }
 
 CatalogIterator::CatalogIterator (CatalogNode* p)
 {
     current = p;
 }
 
 CatalogEntry& CatalogIterator::operator * () const
 {
     return current->entry;
 }
 
 CatalogEntry* CatalogIterator::operator -> () const
 {
     return &(current->entry);
 }
 
 CatalogIterator& CatalogIterator::operator ++ () // prefix
 {
     if (current != NULL)
         current = current -> next;
 
     return *this;
 }
 
 CatalogIterator CatalogIterator::operator ++ (int) // postfix
 {
     CatalogIterator copy (current);

     if (current != NULL)
         current = current -> next;
 
     return copy;
 }
 
 bool operator == (
     const CatalogIterator& one, 
     const CatalogIterator& two)
 {
     return one.current == two.current;
 }
 
 bool operator != (
     const CatalogIterator& one, 
     const CatalogIterator& two)
 {
     return one.current != two.current;
 }
With the above header and implementation of the iterator class the programmer of the main program can now write a loop through a container without needing to know any details of the implementation of the container class (the Catalog class in this example).
 // File: CatalogBrowser.cxx
 
 [ ... ]
 
 #include "Catalog.h"
 #include "CatalogIterator.h"
 
 [ ... ]
 
     Catalog results;
     CatalogIterator catI;
 
 [ ... ]

        case 'r': // show results
 
                 lineNumber = 0;

                 for (catI = results.begin (); 
                          catI != results.end (); 
                              catI++)

                     cerr << ++lineNumber 
                          << ". " 
                          << *catI;
For this to work, the Catalog needs these two functions:
 // File: Catalog.cxx
 
 [ ... ]
 
 #include "Catalog.h"
 #include "CatalogIterator.h"
 
 [ ... ]
 
 CatalogIterator Catalog::begin () const
 {
     CatalogIterator result (head);
     return result;
 }
 
 CatalogIterator Catalog::end () const
 {
     CatalogIterator result (NULL);
     return result;
 }

Secondary indices

If your data need to be accessed by more than one key use multiple search trees (or skiplists or hashtables) that point to (as opposed to containing) the data.

Moving pointers instead of data

When sorting, avoid moving large objects around. Instead move pointers in the sort algorithm and, if needed, move the objects only once at the end.

Programming style

The bottom line is, your programs need not only be correct, they need to be obviously correct. They need to be easy to read. Key ingredients are formatting that reflects the syntax and meaningful (and not misleading or information-free) names of classes, functions, variables, constants, files, ... .


Previous page | Back to top

3:40 PM, Thursday, December 12, 2002