Understanding the High Availability System and the Switchover and Switchback Procedures
A High Availability scenario incorporates all the functionality and workflow of a replication scenario, but it adds three important new elements: pre-run verification, monitoring of the Master and the application running on it, and the switchover process itself.
- Pre-run verification
- During a switchover, there are many things that can go wrong - there might be problems with permissions, or with the application configuration, or even with the settings within the HA scenario itself. For this reason, when HA scenario is created and initiated, Arcserve RHA performs an extensive list of checks. These checks are designed to determine, whether any of the common issues that are known to cause problems during switchover can be found. When such issues are found in the pre-run verification, errors and warnings are presented, prompting you to solve these issues before running the HA scenario.
- Automatic monitoring of the Master and the application running on it
- As soon as the scenario is initiated, the Replica checks the Master on a regular basis, by default every 30 seconds. There are three types of monitoring checks - a ping request that is sent to the Master in order to verify that the Master is accessible and alive; a database check that verifies that the appropriate services are running and the data is in good state; a user-defined check that can be tailored to monitor specific applications.
- If an error occurs with any part of the set, the entire check is considered to have failed. If all checks fail throughout a configured timeout period (by default 5 minutes), the Master server is considered to be down. Depending on the HA scenario configuration, this will cause Arcserve RHA to send you an alert or to automatically initiate a switchover.
- Switchover and switchback workflow
- In an initial HA scenario, the Master is the active computer, and the Replica is the standby computer. The standby computer is continuously checking the state of the active one, to determine whether it is alive and to decide whether to assume the active role.
- A switchover can be triggered automatically or with the push of a button. The first time a switchover occurs, the Replica that was on standby becomes the active computer, and the Master reverts to a standby mode (assuming it is still operational). When the Master (now the 'standby') is ready, a switchback process can be initiated, either automatically or manually. Following the switchback, the Master again becomes active, and the Replica returns to its previous standby and monitoring role.
- Note: After a connection loss, during the attempt to reconnect, a node (either Master or Replica) tries to determine its role. If the two nodes establish themselves as Masters, upon reconnection the newest active Master will continue to act as the Master, while the older one will turn into the standby Replica.
Important! After switchover, the "Server" service on the standby server, used to support file, print, named-pipe sharing, becomes inaccessible for ten minutes after switchover occurs. See the option, HASharesAccessTimeout, in the ws_rep.cfg file.