Data Mining for IVHM and the Space Vehicle Life Cycle
 |
|
|
In a successful Phase 1 project for NASA 2007 Small Business Innovation Research (SBIR)
topic A1.05, "Data Mining for Integrated Vehicle Health Management," Michigan Aerospace Corporation (MAC) demonstrated its
SPADE anomaly detection software to key personnel in NASA’s Intelligent Systems Division (ISD). The feedback from these
demonstrations was used to establish future development directions for Phase 2.
Phase 2 will consist of three major efforts: 1) the design and implementation of the Taiga system, a next-generation
enhancement of the SPADE software, 2) an investigation into combining complementary functionality of Taiga with existing
code at ISD including the Inductive Modeling System, Mariana and others, and 3) the implementation of a prototype automatic
parallelizer, in cooperation with subcontractor Optillel Solutions, for a subset of C++ solutions useful for hardware
acceleration of machine learning applications.
The scope of the interaction with researchers in NASA ISD will be to explore the relationships between IMS and Taiga and
gauge benefits such a Data Handling, Feature Reduction, Visualization and Explainability. We will also investigate
heterogeneous ensemble methods by analyzing the Mariana system.
Optillel’s C++ Parallelizer will reduce MAC’s development costs for parallelizing C++ code for multi-core chips and clusters.
This effort will build on Optillel’s existing body of work that supports graphical programming languages, and will extend their
technology to the analysis and parallelization of C++ code.
Both the Taiga system and Optillel’s prototype have significant commercialization potential in industries as diverse as
Chemical, Pharmaceutical, Manufacturing and Aerospace.
High-Level Layout of the Taiga System
Taiga is ultimately based on two fundamental principles:
- Distill all input features into highly-quantized distributions that maintain
roughly the same discriminating power as the original data while being suitable to more efficient storage, as well as
facilitating more rapid learning cycles.
- Leverage advanced Decision Trees and Ensembles to perform regression, classification,
semi-supervised learning and anomaly detection, while also providing a rich stream of ancillary information that can yield
explanations, similarity relationships amongst variables and samples, and opportunities for visualizing relationships not afforded
by other machine-learning paradigms.
| |
1) |
Source Data: input data can come from sources such as satellite
telemetry, direct sensor measures, and other means of collection that must be formatted into expected input streams |
| |
2) |
Vectorization: the process of converting source data records into fixed-length vectors |
| |
3) |
Table-Oriented Data: the input format for Taiga – text files delimited by white space/punctuation |
| |
4) |
Data Handling Modules: examination of data to determine distributions, assess datatypes and map
into suitable form for training |
| |
5) |
Binary-Formatted Files: binary files of byte- or bit-valued data structured for fast, efficient tree growing |
| |
6) |
Training Modules: a variety of methods for creating high-performance ensembles of trees |
| |
7) |
Data Models and Associated Products: tree models of the training data for analysis and
evaluation |
| |
8) |
Models from Other Paradigms: models produced by other paradigms that may be combined with trees
in a heterogeneous ensemble |
| |
9) |
Evaluation: the process of assessing unknown records |
| |
10) |
Visualization: examination of models and results |
| |
11) |
Deployment: modifications to run on a platform other than the one on which it was
implemented and tested |
Examples
Layout of Typical Drill-Down Analysis
Explanation of “Spectogram” in Anomaly Detection Context
Analysis of a Signal from NASA’s ADAPT system