Top  | Previous | Next

What is Redundancy?

Redundancy is an advanced feature of Ignition that provides a higher degree of fault-tolerance and protection from downtime due to machine failure. Using redundancy, two Ignition installations can be linked together, so that when one fails, the other takes over and continues executing. All of the clients connected will be redirected to the backup machine, and historical data will continue to be logged.

 

There are a variety of design decisions that come into play when setting up redundant systems, so it is important to understand the available options, and how the pieces of the system function in a redundant setting. This chapter will start with key terminology that will be used heavily, and will then proceed to explain how the main parts of the system function. It will then explain the various settings available, and will finish up with an examination of a few common setups.

Clustering vs. Redundancy, and previous versions of Ignition

Previous versions of Ignition contained a feature called clustering that was similar to redundancy in that it linked multiple systems, but different in terms of the goals it aimed to achieve. The primary goal of clustering was to provide a seamless platform for balancing many client connections across multiple servers. In the reality of the field, it was observed that client load was rarely a cause for concern. Ease of configuration and greater flexibility in creating redundant fail-over systems were larger concerns, and resulted in the switch to "redundancy".

Terminology

Here are some of the most common terms used in relation to redundancy.

 

Activity Level

The activity level describes what the Ignition installation is currently "doing". A node in a redundant pair will operate at one of three levels: Cold, Warm, or Active. In "cold", the system is doing a minimal amount of work. In "Warm", the system is nearly running at full level, in order to switch over quickly. Both of these levels imply that the other node is currently active. In "active", the system is the primary system, responsible for running all sub-systems.

Node

A node is an Ignition installation, set to be part of the redundant pair. There can be a master node, and a backup node.

Active Node

The active node is the Ignition installation that is currently at the "active" level, and is responsible for running. It is also described occasionally as the "responsible node". It can be either the master or backup node, even when both are available. For example, if the backup node becomes active after the master node fails, and the master comes back up but is set to manual recovery mode, the backup will continue to be active until it fails or the user switches responsibility back to the master.

Master Node

The node that is responsible for managing the configuration state. It is also generally expected to be the active node when available, though this is dependent on settings. It is therefore import to separate the ideas of the master node and the active node.

Backup Node

The node that communicates with the master and takes over when that node is no longer available.