|
Terascale Spectral Element Algorithms and Implementations
Argonne National Laboratory/University of Chicago
The spectral element method is a weighted residual approach that employs
Nth-order tensor-product polynomial bases (typ., N = 5-15) within K deformable
subdomains. This approach combines the accuracy and efficiency of spectral
methods with the geometric flexibility of finite elements. Because of its low
numerical dispersion and dissipation, the method is ideal for the simulation of
transitional and weakly turbulent flows.
We begin with an overview of the method and then focus on the algorithms and
implementations which have resulted in terascale performance (0.376 Tflops on
2048 nodes of ASCI Option Red). The key ingredients to achieving this level of
performance are the local tensor-product structure, a high degree of
parallelism at the inter-element level, and scalable multilevel solvers.
Locally, matrix-vector products involving O(KN3) degrees-of-freedom
and full stiffness matrices can be recast as matrix-matrix products requiring
only O(KN4) work -- a substantial savings over the O(KN6)
work and storage required without tensor-product forms.
Globally, parallelism derives from the usual SPMD model in which clusters of
elements are distributed to separate processors. Communication required to
update interface vertices during the matrix-vector product phase is nearest
neighbor and is handled asynchronously by our stand-alone gather-scatter
package, GS.
Incompressibility is enforced through iterative solution of an elliptic problem
at each time step. Efficiency of the preconditioned CG scheme rests upon a
multi-level preconditioner that includes a coarse grid solver to rapidly effect
the transfer of low-wavenumber information throughout the domain. Because of
its sparsity and distributed nature (n/P = O(1)), the coarse grid solve is
communication intensive and frequently a bottleneck for P ~> 64. We have
developed a fast parallel coarse grid solver that will scale to tens of
thousands of processors. The method is based on projection onto a sparse basis
for Rn and, for appropriately ordered FEM or FD meshes with n
degrees of freedom, leads to communication costs of O(n1/2 log P)
for two-dimensional meshes and O(n2/3 log P) for three-dimensional
meshes. This is a substantial improvement over previous coarse-grid solve
strategies, which require O(n log P) or O(n) communication. This complexity
analysis is supported by experiments for P=1-4096.
We close with results from a series of challenging problems in fluid
mechanics and heat transfer.
Hosted by Richard Byrd. Refreshments will be served immediately following the talk in ECOT 831.
|