home · mobile · calendar · colloquia · 2001-2002 · 

Colloquium - Maddison

Mesquite, a Programming System for Evolutionary Biology: Modularity, Idealism, and the Economy of Imagination in Scientific Computing
Department of Ecology and Evolutionary Biology, University of Arizona

Since the 1960s, quantitative methods to summarize and analyze the diversity of species characteristics have played an increasingly important role in comparative biology. Analysis of this diversity must consider its broad-scale genetic history, which is called phylogeny: the evolutionary tree of branching species lineages. Thus, fundamental to these quantitative methods is a model of a branching process: species dividing and evolving through time. Specialized software for phylogenetic analysis has been vital to evolutionary biology, with some of the big packages, written each by one or two biologists, becoming citation classics.

The increasing number of small specialized programs idiosyncratic in interface and file format, and the inertia of the big packages, motivated us to develop the system "Mesquite". Its emphatically modular architecture allows new calculations to be added easily, and composite analyses to be performed by combining modules written by different programmers. Its idealistic goal is to stimulate the economy of imagination in evolutionary analyses: to encourage distribution of new analytical methods by facilitating their programming, to enable users to invent new composite analyses by linking analytical modules in new ways, and to allow biologists to discover patterns in their data in an environment emphasizing visualization and exploration.

The design of a GUI-intense, interactive system with unpredictable components (depending on modules installed) has offered many challenges, at least to this biologist: What are the bureaucratic rules of module interaction? If individual modules require their own GUI controls and menus, where should these be placed? If the GUI is unpredictable, then how to provide documentation? Scientists need to know what analysis they performed, and who to cite for the method -- how to inform the user of the basis of publishable results, and how to reproduce an analysis? Scientists vary in their imagination and ability -- how to balance the value of an abundance of choices with the danger of overwhelming the user? Except for some vexing issues in the last two of these challenges, more or less satisfactory solutions have been achieved. Now that the architecture is stabilizing, the most urgent task is to solidify the core analytical calculations, and see whether idealism can turn into reality.

Hosted by Elizabeth Bradley.
Refreshments will be served immediately following the talk in ECOT 831.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)