Argonne National Laboratory/University of Chicago

4/29/2002

2:00pm-3:00pm

The spectral element method is a weighted residual approach that employs Nth-order tensor-product polynomial bases (typ., N = 5-15) within K deformable subdomains. This approach combines the accuracy and efficiency of spectral methods with the geometric flexibility of finite elements. Because of its low numerical dispersion and dissipation, the method is ideal for the simulation of transitional and weakly turbulent flows.

We begin with an overview of the method and then focus on the algorithms and implementations which have resulted in terascale performance (0.376 Tflops on 2048 nodes of ASCI Option Red). The key ingredients to achieving this level of performance are the local tensor-product structure, a high degree of parallelism at the inter-element level, and scalable multilevel solvers.

Locally, matrix-vector products involving O(KN^{3}) degrees-of-freedom
and full stiffness matrices can be recast as matrix-matrix products requiring
only O(KN^{4}) work -- a substantial savings over the O(KN^{6})
work and storage required without tensor-product forms.

Globally, parallelism derives from the usual SPMD model in which clusters of elements are distributed to separate processors. Communication required to update interface vertices during the matrix-vector product phase is nearest neighbor and is handled asynchronously by our stand-alone gather-scatter package, GS.

Incompressibility is enforced through iterative solution of an elliptic problem
at each time step. Efficiency of the preconditioned CG scheme rests upon a
multi-level preconditioner that includes a coarse grid solver to rapidly effect
the transfer of low-wavenumber information throughout the domain. Because of
its sparsity and distributed nature (n/P = O(1)), the coarse grid solve is
communication intensive and frequently a bottleneck for P ~> 64. We have
developed a fast parallel coarse grid solver that will scale to tens of
thousands of processors. The method is based on projection onto a sparse basis
for R^{n} and, for appropriately ordered FEM or FD meshes with n
degrees of freedom, leads to communication costs of O(n^{1/2} log P)
for two-dimensional meshes and O(n^{2/3} log P) for three-dimensional
meshes. This is a substantial improvement over previous coarse-grid solve
strategies, which require O(n log P) or O(n) communication. This complexity
analysis is supported by experiments for P=1-4096.

We close with results from a series of challenging problems in fluid mechanics and heat transfer.

*Hosted by Richard Byrd.Refreshments will be served immediately following the talk in ECOT 831.*

Department of Computer Science

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu