CSCI 2270 Computer Science 2: Data Structures
Fall 2003
Karl Winklmann
 

Recent news item(s) :

  1. The winners of the project competition are ...

    In first place ...

        D. Football Game

    and a tie for second place between

        E. 3D Graph Path Game, and
        K. Distributed Genetic Algorithms.

    Congratulations, and thanks to everybody who showed a project. Winners can pick up their book prizes in ECOT 721, now or after the break.

    The votes were counted by Nan and Rika.

    Posted: Monday, December 15, 2003

 

Trying to summarize ...

December 2, 2003

 


Previous page | Next page | Latest page | Schedule and syllabus | Home Page | Programs | Table of contents | News archive | Dora

Organizing data

Sorting

Anything works on small amounts of data. Use QuickSort or HeapSort or MergeSort on larger amounts when you don't know anything useful about the data or when it's not worth your time trying to beat these general-purpose methods. Better yet, just use whatever sort function comes with the environment you are programming in.

But know how to exploit special properties of the data when that is worth doing. It does not take much in terms of special properties: short strings or integers you can sort one column (digit, character position, or groups of digits or characters) at a time, least significant column first (that's RadixSort), using a stable linear-time algorithm on each column (CountingSort will do). That's a very potent approach.

Building a search tree is another way of sorting and has the added benefit that you end up with a plant that can grow: insertions in trees are fast, insertions in sorted arrays are painfully slow.

The fastest approach to sorting is to refuse to sort: know when to say no to sorting and instead use ...

Hash tables

Very fast. Plug in a hash function for the data at hand. The function should make use of all parts of the key (not "take the first three digits of the ID") and spread keys across the whole table. If you are not confident about the size of the table, use dynamic tables: switch to one twice as big when the current one gets too crowded, half as big when the current one gets too sparsely populated. Better yet, use an existing implementation, but use your own hash function since you know the data.

Trees

Use search trees when hash tables don't let you do what you want (find the largest, browse alphabetically ... ). Search trees need care to keep their balance (because good search trees are shallow and wide, while deep and narrow trees — list look-alikes — can hurt you).

If you need to do something with the whole tree (copy, print, determine its size, ... ), do it recursively. Work on a single path you can do iteratively. Sophisticated search tree algorithms do insertions and deletions by working on a single path but in a way that maintains an overall good balance of the tree. The one example we looked at were randomized binary search trees. There are other approaches, covered in CSCI 3104, Algorithms.

Trees and recursion

Trees go hand-in-hand with recursion much like arrays go with loops. If the task at hand is by its nature recursive (e.g., searching and/or displaying directory trees) recursive functions are your friend.

Combining trees and hash tables

You can have your hash table and your tree, all pointing to the same data. Multiple hash tables and trees and lists, for that matter, no limit. The trick is not to enter the data items themselves in those structures, enter pointers to the items.

Priority queues

Array-based way to implement repeated selection of the biggest. Superb for that task.

Programming issues

Dynamically allocated memory

Whenever you create a class that involves dynamically allocated memory (i.e., uses the new operator) you need to program a copy constructor, an assignment operator, and a destructor or else your objects will behave in unituitive ways (with the default “shallow” copies and assignments) and crash your program in mysterious ways (by memory leaks). And of course, always initialize pointers.

Arrays and pointers

There are only two ways to organize data in computer memory: by arrays and/or by pointers. (Ultimately, arrays and pointers are very similar: memory is one large array, pointers are subscripts to that array).

Making it all work

You have to decide first on the class definitions — what information is to be kept and what functions there are. Then you need to implement the functions slowly, one or a small number of them at a time.

And some miscellaneous advice ...

Choose the right data structure to fit the task

Don't waste your programming expertise on trying to improve a poor choice of algorithm or data structure. E.g., dynamic arrays (vector-s in the STL) aren't good for maintaining sorted data.

Special cases, optimizing ...

Be suspicious of handling special cases a little faster. That's usually worthless if not counterproductive: constantly testing for special cases may be a net loss.

Use existing software

Apply your expertise to choosing the right data structures and algorithms, not re-programming them.

Running times

Understand that a program with a quadratic running time will be seen as crashing, on large data.  


© 2003 Karl Winklmann 3:08 PM, Tuesday, December 16, 2003