WAYS TO THINK ABOUT PARALLELIZING AN ALGORITHM: (your mileage may vary) - what is the data dependency structure of the problem? - what is each processing unit (PU) doing? - what data are in each PU? - how many time clicks are required? - how many PUs are involved at each time click? - how can I keep all the PUs busy all the time? - how can I usefully throw more hardware at the problem?