Adaptive Reconfigurable Mobile Objects of Reliability (ARMOR)

High Availability Infrastructure

Armors are multithreaded processes internally structured around objects, called elements, which contain their own private data and provide elementary functions or services. Every armor process contains a basic set of elements that provide core functionality, including reliable point-to-point messaging between armors, response to heartbeat messages (which indicate a given armor process’s liveness), and the ability to checkpoint armor state.

Armor processes communicate via message passing: the microkernel present in each distributes messages between elements within an armor and between the armors in a system. Every incoming message causes the microkernel to spawn a new thread to process the message, and the execution of each thread invokes one or more elements within the armor process. A thread terminates when the armor finishes processing the message or when it sends an outgoing message in response to the original.

Structurally, messages consist of two primary parts: the microoperation sequence and the payload. Every message carries a series of microoperations to be executed by armors. The microkernel delivers each microoperation in sequence to elements that have subscribed to that operation. During the initialization within the system, each armor establishes a subscription list, which provides mapping between elements and the microoperations each element can process. The microkernel is responsible for maintaining the list at runtime. Each message contains a general payload area for storing data. Elements can read from and write to the payload fields while processing the microoperations in a message. Thus, elements can exchange information with one another within a single execution thread. This information exchange doesn’t interfere with other execution threads because each thread manipulates its own payload.

While processing a microoperation, an element can update its local state or the state of the payload fields; it can also change an armor process’s control flow by adding new microoperations to the current sequence. This modular, event-driven architecture permits developers to customize an armor process’s functionality and fault-tolerance services (detection and recovery) according to the application’s needs.

Above is a simple armor configuration with runtime environment scaled to two nodes. There are four basic components:

  • The fault-tolerance manager (FTM) initializes an Armor-based environment’s working configuration, maintains registration information on all armor objects and application processes, and initiates recovery from armor and node failures.
  • The heartbeat armor runs on a node that is separate from the FTM. It detects failures in the FTM by periodically polling for “liveness,” and then initiates FTM recovery.
  • A daemon armor runs on each node in the network, serving as a gateway for armor-to-armor communication and detecting runtime failures of local armors (those running on the same node as the daemon).
  • The execution armor launches local application processes, detects their failures, and performs recovery.

©2006 Armored Computing Inc.: 60 Hazelwood Drive, Champaign, IL 61820 | (630) 347-7235
The Armored Computing Logo is the trademark of Armored Computing Inc.