Novell's Fault Tolerant Solutions, the technology

The Technology of Novell's Fault Tolerant options

This page explains the technology of SFT III, StandbyServer, Many-To-One StandbyServer (MTO) and NHAS.

System Fault Tolerant III (SFT III)
Novell defines tree levels of System Fault Tolerance (the strategy for protecting network data), of which only the third is implemented at the moment of writing.
SFT I includes Hot Fix, read-after-write verify, and duplicate directory entry tables (DETs). If a program tries to write data to a bad sector on the disk, the Hot Fix feature automatically redirects the output to a special storage area (Hot Fix mode is default in NetWare). In read-after-write verify, NetWare compares the written material on disk with the material in memory before reusing the memory. The DETs contain information about hte server's files and directories, so duplicating htem ensures htat this important information is available, even if one table becomes corrupted.
SFT II includes disk mirroring or duplexing. In disk mirroring, data is written to two different hard disks, butover the same channel. Mirroring duplicates data in case one hard disk fails, but doen't provide any protection if the channel itself fails. This in contrast with Windows NT, where a mirrored primary disk failure can cause significant downtime (set by a failover-time parameter). Disk duplexing on the other hand uses two seperate channels to write the identical data to two disks. If one disk fails, the duplicate disk takes over seamlessly and automatically, thus providing security against either hard disk or disk channel failure.
Further, there are two stages preceeding SFT III: Dynamic Load/Unload and Client Autoreconnect. Dynamic Load/Unload includes Hot Plug PCI, I₂O, the Host Bus Adapter device drivers, which facilitates its upgrades without server downtime, and the Hot Swap Driver capability (for dismounting volumes). Client Autoreconnect is a more basic implementation of hte smart client (see below).
SFT III uses duplicate servers (full server mirroring), so that all transactions are recorded on both servers. If one server fails, the other will have an identical state, and, will therefore, be able to take over automatically. When the failed server is restarted, it will automatically re-synchronize with the other server. The SFT III solution also offers the possibility to terminate and upgrade the hardware on one server at a time with no disruption in network service. The server memory, caching and state are preserved in a server failover. AN SFT III pair appear as a single server image to users and the administrator, therefore also the client connects and other server acitvities are taken over by the redundant server without any disruption.
Other benefits of SFT III are: dual processor support, supports TCP/IP, an alternate Mirrored Server Link (MSL-) channel, improved printer support by providing uninterrupted print services and ManageWise v2.5 (providing improved manageability). Another function included in SFT III is the Kernel Fault Recovery or ABEND Recovery, which incorporates the ability to either isolate a software failure or terminate and restart hte server. Failure information is logged so it can be used by hte sysadmin to analyze the nature of hte fault. See the following pictures for a graphical representation of the configuration of SFT III: the first picture shows the situation in a normal state, the second after a failure.

Figure 1. SFT III configuration during normal operation

Figure 2. SFT III after a failure

More information about SFT III can be found on the following pages:
Links to Novell's High Availability Platforms
Serverspiegelung
intraNetWare et SFT III

Back to top Back to IT stuff - index Home

StandbyServer
StandbyServer is installed on a standby machine that is connected to the network and to the primary server with a dedicated link. Through the intraNetWare and NetWare disk mirroring, an active copy of all system data is kept on the standby machine. If the primary server fails, the standby server automatically reboots and becomes the primary server. This is causing a few moments of delay required for the switchover + reboot to the secondary server in the event of a failure. However this solution doesn't only protect against hardware failures, but also software failures, and it's possible to implement symmetric multprocessing (SMP). Further,the StandbyServer can mirror many intraNetWare servers to one standby.

Figure 3. One-to-one StandbyServer

StandbyServer for NetWare allows the use of a shared disk device that both the primary server and the standby machine can use. In this scenario, the shared disk device must have the capability of having two physical ports, or at least be able to use a switch (SCSI switch) in order to have connections to both the primary server and the standby machine. When StandbyServer uses a shared disk device, only the acting primary server makes use of the shared disk system, the standby machine simply maintains a passive connection to the shared device.

Figure 4. Model of the Shared Disk Device
The shared disk device is used in an active/passive role whereby only one machine has control of the device at any one time. When a server failover occurs, the active role is switched to the standby machine when it takes over for the failed primary server. There are other papers available on Vincas web site that deal specifically with using shared disk devices with StandbyServer.
Some considerations for choose StandbyServer: if your organization can tolerate a few moments of delay for switchover to the secondary server in the event of a failure, if you use multiple NLMs and if you need to protect against both hardware and software failures.

More information about StandbyServer can be found on the following pages:
StandbyServer for Netware, white paper
StandbyServer product flyer

Back to top Back to IT stuff - index Home

Many-To-One StandbyServer (MTO)
Summary
StandbyServer Many-to-One allows a single standby machine to act as a real-time automatic failover machine for multiple primary servers. The standby machine detects hardware and software failures in the primary machines and automatically 'stands in' for a failed server. While 'standing in', the standby machine continues to perform server mirroring for the remaining primary servers. No proprietary disk systems, monitoring devices, or special operating system software is required. It takes advantage of the economies and performance of industry-standard hardware to extend the configuration capabilities and flexibility of implementing automatic server failover to a standby machine.The following figure shows the basic configuration of the MTO StandbyServer (However, the dedicated link isn't obligatory).

Figure 5.Dedicated High-Speed Link
*The dedicated link hub can be replaced
with individual cards in the standby machine Introduction
Novell's High Availability products such as StandbyServer Many-to-One protect against server downtime without the trade-offs mentioned above. MTO provides the following features and benefits: Downtime is essentially eliminated. Comprehensive data reliability and high availability in one solution. Real-time mirroring provides the most complete and reliable solution to server availability. Provides fault tolerance for the disk channel. Provides high availability for the entire server. Uses the same startup information to ensure the standby machine functions (looks and feels) just like the server to its clients. All server components protected, not just the disk drives - complete, redundant server. Automatic server failover - no manual intervention required. Compatible with industry-standard hardware - uses standard communications cards with IPX protocols. Does not require identical servers - MTO allows mismatched machines to be used. Using the Utility Server feature, the standby machine can perform other server functions such as a print server, gateway, or router.
Operation
The MTO is a software solution. Under normal operation, no clients are logged into the standby machine and therefore a second full user count of NetWare is not necessary. The standby machine uses a run-time license copy of NetWare (included with MTO). The standby machine loads a suite of NetWare Loadable Modules (NLMs) that monitor the status of all primary servers and provide communications paths between the machines. In the event that any of the primary servers fail, the standby machine downs and exits the run-time version of NetWare. Then the standby machine boots up automatically as the failed primary server, using the same license of NetWare that was running on the failed primary server. Since all disk devices are mirrored, the same boot-up sequence, login scripts, bindery, NDS data, and primary server services are restored. In other words, this generates a lag phase (in general that takes a few minutes).
The behaviour of the client machines depends on the Version of NetWare being used, the network protocol and the cleint networking software; aka with NETX, VLM or NetWare 3.12 the clients must reconnect or even relogin after a failure.
Extra features
Most of the features are already mentioned in the comparison table, however some are really interesting to talk about here. With the previous mentioned dedicated link, it's even possible to configure the MTO over a WAN-link (in conjunction with the Read Blocker Option, to decrease the network load). See the figure below for the lay out of this.

Figure 6. The dedicated link hub can be replaced
with individual cards in the standby machine
The Utility Server feature allows the standby machine to function as an independent file server while still maintaining a real-time mirror for the protected primary servers. Normally, the standby machine is dedicated to mirroring the disk devices of the primary servers and monitoring their status to ensure they are still functioning correctly. However, using the Utility Server feature, the standby machine can contain its own SYS volume on a disk device that is not mirrored over the dedicated links to the primary servers. With its own SYS volume, the standby machine can have network clients of its ownor have processes that are independent of the primary servers. The only caution is that should any of the primary servers fail, the standby machine's clients are disconnected when it fails over. This allows the standby machine to share the load of network services by functioning as a backup server, print server, CD-ROM server, or some other necessary process.
Novell SnapShotServer product makes use of this feature to 'freeze' data as it appeared at specific moments in time, and keep these 'frozen' images on the standby machine. This allows backup engines to use the standby machine to perform backups at any time without encumbering the primary servers with the additional overhead normally associated with performing tape backups. others are the Throttling Mechanism (for improving client network performance), Sharing Devices and the Automatic Disk Integrity Check.
System requirements
The system requrements mentioned in the table are graphically presented in the following two pictures.
Primary Servers
A valid, licensed copy of NetWare 3.12, 4.x or IntranetWare must be installed and running on the primary servers.

Figure 7. The Primary Servers
Standby Machine
The standby machine should be configured much like the most fully configured primary machine, with as many disk devices as necessary to mirror all of the primary servers' disk devices. In other words, every mirrored primary server drive must have a corresponding drive on the standby machine. One dedicated link for each primary server can be used, or a single dedicated link card can be usedwith a hub to connect to each of the primary servers.

Figure 8. The StandbyServer
The standby machine must be bootable as a NetWare server, with NetWare 3.12, 4.x, or IntranetWare fully installed and tested. If different versions of NetWare are used on the primary servers, then the highest version used on the primary servers should be used on the standby machine. In the event of a failure, the standby comes up with the version of NetWare previously running on the failed primary.

More information about StandbyServer can be found on the following pages:
The White Paper about MTO
Some general information on MTO

Back to top Back to IT stuff - index Home

Novell High Availability Server (NHAS)
Novell's high availability solutions preceeding the clustering strategy are delivered in three stages: Moab, Park City and Escalante.
Moab: High Availability. Moab builds on Novell's existing base of solutions in making sure network resources or servers share data and application recources seamlessly. Some components Moab is including are Novell Replication services (NRS), BorderManager, Novell Storage Services (NSS), MultiProcessing Kernel (MPK), Virtual Memory and Memory Protection.
Park City: Data High Availability and Data Clustering. Park City concentrates onda ta high availability and data clustering: users will not perceive data as reisding on a specific server, but on a virtual volume accessible by multiple servers. This creates multiple paths to the same data while replication provides redundant copies of the data. The client requests are dynamically distributed to the server with the lowest workload, aka load balancing, prividing a more efficient use of the resources. A vital element in this model is the smart client. Without the smart client users need to duplicate resources (reconnect) if the resource they are accessing becomes unavailable; The smart client provides this feature for the client machines.
Some of the features shipping along with Park City are: Flexible Mirroring-Phase II, CLuster volume, Dual Path Volume, NSS-D, Smart Client and USer Load Balancing.
Escalante: Resource Distribution. Escalate aims at providing availability and scalability by distributing resources across a cluster of servers. The application adn processing load is dynamically distributed across the network and data, application and service requests are effectively routed and processed for consistent quality of service. The network components appear to the users as a single-system image (SSI). Some of the components are: NSS-V, Application Load Balancing, SSI, Wide Area Clustering and 64-bit hardware support.

More information about NHAS can be found on the following pages:
Technical FAQs about NHAS
NHAS for NetWare 4.2

Homepage of Maria (Marijke) Keet