I am keeping this blog while I create my own solution to homework 10/11 for CSCI 2270, Fall 2009. The goal is to give you an idea of my thought process while solving this problem, as well as the concrete steps I break the problem into.
You may use any of the code posted here (with appropriate acknowledgment).
I started off with the following steps which are probably familiar to you. I often see many of you start your homework in a similar way:
At this point I also changed MINIMUM to 2 in vset.h so that small trees would still have multiple levels.
Finally, before starting any coding, I wrote out my class invariant. I based this on the six B-Tree rules on pages 542-543 of the textbook. The difference between what I wrote and what is in the book is that I used the member variables of the set class to define my invariant. To give you an idea of what I mean and to get you started, here is what I wrote for the first rule:
// CLASS INVARIANT:
// 1) data.size() >= MINIMUM except possibly at the root.
Phew! That was a lot of work and I haven't even written any code yet! I am
now ready to start implementing the member functions tomorrow.
After several hours, I have my insert and clear functions working. Even though insert is the hardest function to write, I thought it was the best place to start because you can't really test anything until insert is working. If it took me a few hours, it will probably take you almost as long! I hope my description here helps you with your implementation.
Actually, the first thing that I did was write the print function and a short interactive test program so that I could see what the tree was doing while my insert function flailed about. My test program gives the following options:
Select from the following: I: Insert M: Multiple Insert P: Print Q: QuitI thought that multiple insert was especially useful for testing, because you need to insert a lot of numbers into the tree to get it to create new levels even with min=2, max=4. Specifically, you need to insert the numbers 1 through 17 to get 3 levels. My multiple insert just runs through a loop and inserts a range of numbers, here is the code:
...
case 'M':
cout<<"Enter (min, max): "<<endl;
cin>>x>>y;
for(i = x; i <= y; i++)
s.insert(i);
break;
...
// SUGGESTED FUNCTION FOR DEBUGGING
void set::print(size_t indent) const
{
const int width = 3;
size_t i = 0;
// print indent spaces
for(i = 0; i < indent; i++)
cout<<" ";
// print the data from this node
for(i = 0; i < data.size(); i++)
cout<<setw(width)<<data[i];
// print endl at end of the root data
cout<<endl;
// recursively print children
for(i = 0; i < child.size(); i++)
child[i]->print(indent + 4);
}
And here is what the output looks like after doing my multiple insert
from 1 to 17:
9
3 6
1 2
4 5
7 8
12 15
10 11
13 14
16 17
You can interpret this as: The root is 9 and it has two children, one
containing the numbers 3 and 6, the other containing the numbers 12 and
15. The child containing 3 and 6 has three children: the first contains
1 and 2, the second contains 4 and 5, the third contains 7 and 8. Etc.
At last I arrived at the dreaded insert function. In my mind, this is the meat of homework 10. I followed the textbook very closely in my implementation of insert. The textbook's discussion of insert starts on page 551 and continues all the way through page 556. (Note that in the text, the array of pointers to children is called 'subset' instead of 'child'. In the text, subset[i] refers to the child at index i.) I wrote my insert function in the following steps:
std::vector<value_type>::size_type find_index(const value_type& entry) const;
It returns the index of the first element greater than or equal to
entry. You will need this in several member functions, so even if you
have a simple implementation of find_index (e.g., using lower_bound),
it is still nice to have this as a separate helper function.
bool loose_insert(const value_type& entry);
By the way, in step 2c where the text says "if so, then fix that
problem", it is referring to the next function that we are about to
write:
void fix_excess(std::vector<value_type>::size_type i);
This function is responsible for splitting a child with too many
entries and bringing the median entry up to the parent. I wrote this
function in the following steps:
There were several problems with my first attempt. I got a segfault the first time that my fix_excess function was called. I used gdb to find the line that my code was crashing on. I eventually figured out that the bug was that I needed to handle leaves slightly differently in my fix_excess function, as described above. Then, I thought that my insert function was working but when I inserted the 17th node (which creates the 3rd level) the new tree did not look correct. In this case gdb wasn't very much help because the problem wasn't a crash, it was just that there was a mistake in my algorithm somewhere. You might not make the exact same mistake that I did, but you may find my debugging technique useful. I added calls to the print() function at the beginning and end of fix_excess (where I suspected there may be a bug). This showed me what the tree looked like immediately before and after every call to fix_excess. This gave me enough information to realize what the problem was and my insert function is finished now (as far as I know).
I also wrote my clear function for good measure, because it's always important to deallocate dynamic memory. At first my program was crashing every time I exited. Any time this happens it's almost certain that you have a bug in your destructor. In my case the error was very simple although it took me awhile to realize what was happening (and I finally figured it out by watching the call stack in gdb). Here is what I saw by typing 'bt' at the gdb prompt shortly before my program crashed:
#0 csci2270::set::clear () at vset.cxx:90 #1 0x080491fc in csci2270::set::~set (this=0x804e068) at vset.h:57 #2 0x08049ef1 in csci2270::set::clear () at vset.cxx:95 #3 0x08049ec6 in csci2270::set::clear () at vset.cxx:94 #4 0x080491fc in csci2270::set::~set (this=0xbffff948) at vset.h:57 #5 0x08048e88 in main () at vsettest.cxx:55The way my destructor was written it called clear recursively and then called delete on each child to free the dynamic memory. Of course, I overlooked a major problem with this. Looking at the call stack above, I thought, "Wait a second. The destructor is calling clear, which is calling clear recursively, and then when I use delete to free dynamic memory it is calling clear AGAIN!" Oops! This actually results in clear getting called multiple times on the same node. In fact, there is no need to call clear() from clear(), instead you can just use delete which not only frees the dynamic memory, but also calls the destructor of the child node, which ends up making more calls to clear() as well. I deleted the recursive call to clear(), keeping only the delete, and all was well.
Once you have insert done, this part should seem relatively easy.
For contains I used the find_index function that I had already written yesterday. Based on the return you know which recursive call to make. I must have been thinking about break because I messed this up multiple times while writing it. Don't forget to check that the node actually has children before attempting to make a recursive call to them.
Technically this isn't the most efficient thing to do, but I just implemented this by calling the assignment operator.
I passed all of my initial tests and was feeling pretty clever, so I tried running vsetexam expecting that I would pass all the tests except for the erase function (which is homework 11). Not quite:
frohardt@csel-04:~/CSCI-2270/vset$ ./vsetexam Running set tests: TEST 1: A. Testing empty and contains for an empty set. Segmentation fault
Oops! Looks like I got caught not checking a boundary case. Even worse, I got caught not using assert to check the preconditions of my helper functions. I added the asserts and found out that:
Running set tests: TEST 1: A. Testing empty and contains for an empty set. vsetexam: vset.cxx:174: size_t csci2270::set::find_index(const int&) const: Assertion `data.size() > 0' failed. Aborted
It turns out that my contains function was violating the precondition of my find_index function, which is that there must be at least one element in data. Otherwise, the return from find_index would be nonsense. After fixing up some things I passed all the tests in vsetexam except for the tests related to the erase function. Even better, I was informed that:
If you submit the set now, you will have 90 points out of the possible 75 points.I guess we better fix this (maybe you would prefer that we didn't). Stay tuned for more info on the scoring.
This is harder than insert because there are more cases, but on the other hand the description in the book is even more explicit than for insert. Also, you may find this part to be easier than insert in some ways because now you are familiar with using the vector member functions to do things like move values from one vector to another and erase values. I actually haven't finished debugging my implementation of erase and still have one bug I have to track down, so I will provide more information about this once I finish. Here are some of the issues I ran into while debugging and some tips for testing:
if(loose_erase(target)) Return zero since loose_erase did not remove an entry from the set.This is incorrect, because we want to return false if loose_erase fails. This should be:
if(!loose_erase(target)) Return zero since loose_erase did not remove an entry from the set.
My last bug was hard to track down, and I eventually did it by stepping through erase line by line and drawing a picture of what my tree looked like as I went. It was very useful here that I had an interactive test program and knew of a specific case that my erase() function always failed on. This is much easier to work with than just knowing that vsetexam fails, and I would encourage you to have an interactive test program and to try to find specific cases where your functions fail.
Since my bug was pretty hard to track down I'll let you know what I did in my forgetful programming (a common problem for me). In cases 3 and 4 of fix_shortage (which are both done using my combine() helper function), there is a step where we have to move all of the children from one child to the other. The mistake I made is that right after you copy the child pointers over, there are now two sets of pointers to the same children. I then wrote 'delete child[i+1]' but that causes the whole subtree to be deleted recursively. Make sure that you empty out child[i+1]'s child vector before calling delete in this case.
I have confirmed with Prof. Main that we will use MINIMUM = 2 to run vsetexam and determine your homework 11 grade. With MINIMUM=200 the tree never grows more than 2 levels in vsetexam, so not all cases are considered. Make sure that you test with MINIMUM = 2 since this is how you will be graded.