1: \begin{figure}[b]
2: \centering
3: \includegraphics[width=78mm]{sampleArmor.eps}
4: \vspace*{-.5in}
5: \caption{Sample ARMOR consisting of multiple Elements}
6: \label{fig:samplearmor}
7: \end{figure}
8:
9: While embedded VLAs provide a lightweight, adaptive layer of fault mitigation within Level 1, the University of Illinois is developing software components that run as multithreaded processes responsible for monitoring and fault mitigation at the process and application layer.
10:
11: Adaptive, Reconfigurable, and Mobile Objects for Reliability (ARMOR) \cite{zk:armorover02} are multithreaded processes composed of replaceable building blocks called \textit{Elements} that use a messaging system to communicate. The components within the flexible architecture are designed such that distinct modules responsible for a unique set of tasks can be plugged into the system. There are separate Elements that are responsible for recovery action, error analysis, and problem detection, which can each be developed and configured independently. ARMORs are configured in a hierarchy across multiple nodes of the entire system. A sample ARMOR is shown in Figure \ref{fig:samplearmor}. In this example, a primary ARMOR daemon is watching over the node and reporting to higher-level ARMORs out on the network \cite{jk:hardwarefailure03}. Elements within node-level ARMORs communicate to ensure that all nodes are operating properly.
12:
13:
14: \textit{Execution ARMOR} is responsible for monitoring and ensuring the integrity of a single application, without requiring any modifications to the program itself. It watches the program to ensure that it is continues to run, and has the ability to restart the application when necessary. As it is monitoring, it may generate messages for other Elements to analyze and act on based on what it finds. The Execution ARMOR is also capable of triggering specific recovery actions based on the pattern of return codes that it receives from the application. Another distinct ARMOR known as \textit{Recovery ARMOR} consists of Elements that have the ability to automatically migrate processes from one machine to another when the work load across machines is not balanced.
15:
16: Within the trigger, ARMORs provide error detection and recovery services to the trigger system, along with any other processes running on Levels 2 and 3. Hardware failures may also be detected. ARMOR components are designed to run under an operating system such as Linux and Windows, and not within low level embedded systems that require real-time memory and processing time constraints.
17:
18: There is also an ARMOR API that allows trigger applications to proactively send specific error information directly to an Element. Data processing and quality rates can also be sent directly to the ARMOR where they may be distributed to corresponding Elements for analysis \cite{jk:hardwarefailure03}.
19:
20:
21:
22:
23:
24: