skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · colloquia · 2001-2002 · 

Colloquium - Tufo

ECCR 200

Terascale Spectral Element Algorithms and Implementations
Argonne National Laboratory/University of Chicago

The spectral element method is a weighted residual approach that employs Nth-order tensor-product polynomial bases (typ., N = 5-15) within K deformable subdomains. This approach combines the accuracy and efficiency of spectral methods with the geometric flexibility of finite elements. Because of its low numerical dispersion and dissipation, the method is ideal for the simulation of transitional and weakly turbulent flows.

We begin with an overview of the method and then focus on the algorithms and implementations which have resulted in terascale performance (0.376 Tflops on 2048 nodes of ASCI Option Red). The key ingredients to achieving this level of performance are the local tensor-product structure, a high degree of parallelism at the inter-element level, and scalable multilevel solvers.

Henry Tufo photo

Locally, matrix-vector products involving O(KN3) degrees-of-freedom and full stiffness matrices can be recast as matrix-matrix products requiring only O(KN4) work -- a substantial savings over the O(KN6) work and storage required without tensor-product forms.

Globally, parallelism derives from the usual SPMD model in which clusters of elements are distributed to separate processors. Communication required to update interface vertices during the matrix-vector product phase is nearest neighbor and is handled asynchronously by our stand-alone gather-scatter package, GS.

Incompressibility is enforced through iterative solution of an elliptic problem at each time step. Efficiency of the preconditioned CG scheme rests upon a multi-level preconditioner that includes a coarse grid solver to rapidly effect the transfer of low-wavenumber information throughout the domain. Because of its sparsity and distributed nature (n/P = O(1)), the coarse grid solve is communication intensive and frequently a bottleneck for P ~> 64. We have developed a fast parallel coarse grid solver that will scale to tens of thousands of processors. The method is based on projection onto a sparse basis for Rn and, for appropriately ordered FEM or FD meshes with n degrees of freedom, leads to communication costs of O(n1/2 log P) for two-dimensional meshes and O(n2/3 log P) for three-dimensional meshes. This is a substantial improvement over previous coarse-grid solve strategies, which require O(n log P) or O(n) communication. This complexity analysis is supported by experiments for P=1-4096.

We close with results from a series of challenging problems in fluid mechanics and heat transfer.

Hosted by Richard Byrd.
Refreshments will be served immediately following the talk in ECOT 831.

The Department holds colloquia throughout the Fall and Spring semesters. These colloquia, open to the public, are typically held on Thursday afternoons, but sometimes occur at other times as well. If you would like to receive email notification of upcoming colloquia, subscribe to our Colloquia Mailing List. If you would like to schedule a colloquium, see Colloquium Scheduling.

Sign language interpreters are available upon request. Please contact Stephanie Morris at least five days prior to the colloquium.

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:29)