home · mobile · calendar · colloquia · 2001-2002 · 

Colloquium - Tufo

Terascale Spectral Element Algorithms and Implementations
Argonne National Laboratory/University of Chicago
4/29/2002
2:00pm-3:00pm

The spectral element method is a weighted residual approach that employs Nth-order tensor-product polynomial bases (typ., N = 5-15) within K deformable subdomains. This approach combines the accuracy and efficiency of spectral methods with the geometric flexibility of finite elements. Because of its low numerical dispersion and dissipation, the method is ideal for the simulation of transitional and weakly turbulent flows.

We begin with an overview of the method and then focus on the algorithms and implementations which have resulted in terascale performance (0.376 Tflops on 2048 nodes of ASCI Option Red). The key ingredients to achieving this level of performance are the local tensor-product structure, a high degree of parallelism at the inter-element level, and scalable multilevel solvers.

Locally, matrix-vector products involving O(KN3) degrees-of-freedom and full stiffness matrices can be recast as matrix-matrix products requiring only O(KN4) work -- a substantial savings over the O(KN6) work and storage required without tensor-product forms.

Globally, parallelism derives from the usual SPMD model in which clusters of elements are distributed to separate processors. Communication required to update interface vertices during the matrix-vector product phase is nearest neighbor and is handled asynchronously by our stand-alone gather-scatter package, GS.

Incompressibility is enforced through iterative solution of an elliptic problem at each time step. Efficiency of the preconditioned CG scheme rests upon a multi-level preconditioner that includes a coarse grid solver to rapidly effect the transfer of low-wavenumber information throughout the domain. Because of its sparsity and distributed nature (n/P = O(1)), the coarse grid solve is communication intensive and frequently a bottleneck for P ~> 64. We have developed a fast parallel coarse grid solver that will scale to tens of thousands of processors. The method is based on projection onto a sparse basis for Rn and, for appropriately ordered FEM or FD meshes with n degrees of freedom, leads to communication costs of O(n1/2 log P) for two-dimensional meshes and O(n2/3 log P) for three-dimensional meshes. This is a substantial improvement over previous coarse-grid solve strategies, which require O(n log P) or O(n) communication. This complexity analysis is supported by experiments for P=1-4096.

We close with results from a series of challenging problems in fluid mechanics and heat transfer.

Hosted by Richard Byrd.
Refreshments will be served immediately following the talk in ECOT 831.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
webmaster@cs.colorado.edu
www.cs.colorado.edu
May 5, 2012 (14:13)
XHTML 1.0/CSS2
©2012