CSCI 2270
Computer Science 2:
Data Structures
Fall 2003
Karl Winklmann
Recent news item(s) :
-
The winners of the project competition are ...
In first place ...
D. Football Game
and a tie for second place between
E. 3D Graph Path Game, and
K. Distributed Genetic Algorithms.
Congratulations, and thanks to everybody who
showed a project. Winners can pick up their
book prizes in ECOT 721, now or after the break.
The votes were counted by Nan and Rika.
Posted: Monday, December 15, 2003
Trying to summarize ...
December 2, 2003
Organizing data
Sorting
Anything works on small amounts of data. Use QuickSort or HeapSort
or MergeSort on larger amounts when you don't know anything useful
about the data or when it's not worth your time trying to beat
these general-purpose methods. Better yet, just use whatever
sort function comes with the environment you are
programming in.
But know how to exploit special
properties of the data when that
is worth doing. It does not take much in terms of special properties:
short strings or integers you can sort one column (digit, character
position, or groups of digits or characters) at a time, least
significant column first (that's RadixSort), using a stable linear-time
algorithm on each column (CountingSort will do). That's a very
potent approach.
Building a search tree is another way of sorting and has the
added benefit that you end up with a plant that can grow: insertions
in trees are fast, insertions in sorted arrays are painfully slow.
The fastest approach to sorting
is to refuse to sort: know when to say no to sorting and instead
use ...
Hash tables
Very fast. Plug in a hash function for the data at hand. The function
should make use of all parts of the key (not "take the first
three digits of the ID") and spread keys across the whole
table. If you are not confident about the size of the table, use
dynamic tables: switch to one twice as big when the current one
gets too crowded, half as big when the current one gets too sparsely
populated. Better yet, use an existing implementation, but
use your own hash function since you know the data.
Trees
Use search trees when hash tables don't let you do
what you want (find
the largest, browse alphabetically ... ). Search trees need care to
keep their balance (because good search trees are shallow and wide,
while deep and narrow trees list look-alikes can hurt you).
If you need to do something with the whole tree (copy, print,
determine its size, ... ), do it recursively. Work on a single
path you can do iteratively. Sophisticated search tree algorithms
do insertions and deletions by working on a single path but
in a way that maintains an overall good balance of the tree.
The one example we looked at were randomized binary search trees.
There are other approaches, covered in CSCI 3104, Algorithms.
Trees and recursion
Trees go hand-in-hand with recursion much like arrays go with
loops. If the task at hand is by its nature recursive (e.g., searching
and/or displaying directory trees) recursive functions are your friend.
Combining trees and hash tables
You can have your hash table and your tree,
all pointing to the same data. Multiple hash tables and trees
and lists, for that matter, no limit. The trick is not to enter the
data items themselves in those structures, enter pointers to the items.
Priority queues
Array-based way to implement repeated selection of the biggest.
Superb for that task.
Programming issues
Dynamically allocated memory
Whenever you create a class that involves dynamically allocated
memory (i.e., uses the new operator) you need
to program a copy constructor, an assignment operator, and
a destructor or else your objects will behave in unituitive ways
(with the default “shallow” copies and assignments)
and crash your program in mysterious ways (by memory leaks).
And of course, always initialize pointers.
Arrays and pointers
There are only two ways to organize data in computer memory:
by arrays and/or by pointers. (Ultimately, arrays and pointers
are very similar: memory is one large array, pointers are subscripts
to that array).
Making it all work
You have to decide first on the class definitions what information
is to be kept and what functions there are. Then you need to implement
the functions slowly, one or a small number of them at a time.
And some miscellaneous advice ...
Choose the right data structure to fit the task
Don't waste your programming expertise on trying to improve a
poor choice of algorithm or data structure. E.g.,
dynamic arrays (vector-s in the STL)
aren't good for maintaining sorted data.
Special cases, optimizing ...
Be suspicious of handling special cases a little faster. That's
usually worthless if not counterproductive: constantly testing
for special cases may be a net loss.
Use existing software
Apply your expertise to choosing
the right data structures and algorithms, not re-programming them.
Running times
Understand that a program with a quadratic running time
will be seen as crashing, on large data.
|
© 2003 Karl Winklmann
|
3:08 PM, Tuesday, December 16, 2003
|