Feature Recognition in Scientific Data

Project Description:

Currently, the rate at which simulation data can be generated far outstrips the rate at which scientists can inspect and analyze it. 3D visualization techniques provide a partial solution to this problem, allowing an expert to scan large data sets, identifying and classifying important features and zeroing in on areas that require a closer look. Proficiency in this type of analysis, however, requires significant training in a variety of disciplines. An expert analyst must be familiar with domain science, numerical simulation, visualization methods, data formats, and the details of how to move data across heterogeneous computation and memory networks, among other things. At the same time, the sheer volume of these data sets makes this task not only arduous, but also highly repetitive. One logical next step is to automate the feature recognition and characterization process so scientists can spend their time analyzing the science behind promising or unusual regions in their data, rather than wading through the mechanistic details of the data analysis. The goal of this project was to develop a tool that does so.

General definitions of features are remarkably hard to phrase; most of those in the literature fall back upon ill-defined words like ``unusual'' or ``interesting'' or ``coherent.'' Features are often far easier to recognize than to describe, and they are also highly domain-dependent. The structures on which an expert analyst chooses to focus --- as well as the manner in which he or she reasons about them --- necessarily depend upon the physics that is involved, as well as upon the nature of the investigation. Meteorologists and oceanographers are interested in storms and gyres, while astrophysicists search for galaxies and pulsars, and molecular biologists classify parts of molecules as alpha-helices and beta-sheets. Data types vary --- pressure, temperature, velocity, vorticity, etc. --- and a critical part of the analyst's expert knowledge is knowing how different features manifest in different data fields.

Our goal was to create a general-purpose feature characterization system and to validate it with a variety of specific instances of problems in different fields. As a first step, we focus on finite element analysis data from computer simulations of solid mechanics problems. Since we want to produce a practical, useful tool, we are working with data from deployed simulators, in a real-world format: ASCI's DMF (Data Models & Formats), a lingua franca used by several of the US national labs to read and write data files for large simulation projects. This choice raised some interesting interoperability issues that are described in the papers cited below. A DMF data snapshot consists of a geometric description of a mesh (generally 2D or 3D) and some information about the physics at each mesh point. Here is an example of such a snapshot-- a simple meshed surface in 3D:

Given such a snapshot, our goal is to characterize the features therein and generate a meaningful report. In this case, the surface is basically smooth with the exception of a single "spike." Spikes are interesting for both numerical and physical reasons, and our algorithms use patterns in the normals to adjacent mesh elements in order to find them. The image below shows the same surface, but with each mesh element rendered in a color that indicates how much its normal vector deviates from the average of the normal vectors of its neighbors:

In order to understand what makes a feature, we began by working closely with domain scientists to identify a simple ontology of distinctive coherent structures that help them understand and evaluate the dynamics of the problem at hand. (Formally, an ontology seeks to distill the most basic concepts of a system down into a set of well defined nouns and verbs -- objects and operators -- that support effective reasoning about the system.) In finite-element applications, as in many others, there are two kinds of features that are of particular interest to us:

those that violate the continuity and smoothness assumptions that are inherent in both the laws of physics and of numerical simulation: spikes, cracks, tears, wrinkles, etc. --- either in the mesh geometry or in the physics variables.
those that violate higher-level physical laws, such as the requirement for normal forces to be equal and opposite when two surfaces meet (such violations are referred to as ``contact problems'').

Note that we are assuming that expert users can describe these features mathematically; many approaches to automated feature detection do not make this assumption. The knowledge engineering process and the algorithms that we use to encapsulate the resulting characterizations, which rely on fairly basic mathematics, are described in the papers cited below, as are the test results.

People:

Nancy Collins, MS Student, Computer Science
Stephanie Boyles, MS Student, Computer Science
Andre Smirnov, MS Student, Computer Science.
Prof. Liz Bradley

Papers:

V. Robins, J. Abernethy, N. Rooney, and E. Bradley, "Topology and Intelligent Data Analysis", Intelligent Data Analysis 8:505-515 (2004)
E. Bradley, N. Collins, and W. Kegelmeyer, "Feature Characterization in Scientific Datasets," IDA-01 (International Symposium on Intelligent Data Analysis), Lisbon; September 2001.

Support:

The DOE ASCI program through a Level 3 grant from Sandia National Laboratories
a Packard Fellowship in Science and Engineering.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these organizations.