EnterpriseSMS® Means High Availability

Availability is a well-defined engineering value and EnterpriseSMS® is designed to enable robust high availability.  Analysis of highly available systems shows that they minimally satisfy four criteria.  These are:

1. They are built from robust components that have the largest possible MTBF.

2. The MTTR of the system’s components is low.

3. They are designed so problem determinations take the least possible time.

4. Single points of failure are eliminated, possibly with multiple alternatives.

ESMS software is designed to meet these criteria.  ESMS software is composed of many different modules. Each of these modules is composed of four components.  These are:

Communications Interfaces – The program components needed to read and write from queues and to read and write over networks and other communications interfaces.  These are highly standardized and found in time tested libraries that are used by all programs. 

Database Interfaces – The program components that implement access to ESMS databases. These are also highly standardized and found in time tested libraries that are used by all programs.  Interestingly, they can communicate with multiple instances of ESMS databases.

Application Capabilities -  These program components create the uniqueness of the individual module.  They are wrapped in the protective cover of the database and communications interfaces and incorporate ESMS standard diagnostic routines.    

Diagnostic Functions – The functions are responsible for detecting, fixing and recording problems detected during the operation of the modules.  They provide the real time diagnostic interface for the module.  They monitor the status of the module and the resources it needs to operate. These software modules restart themselves if critical problems occur and fix many problems that years of use have shown might occur.  Thus, the MTBF of one of these problems is one or two seconds and the problem is already recorded so the MTTR is essentially instantaneous.  Problem diagnosis is automated, but when not, the footprints are available for any remote service person to access expediting correction of the problem. 

Elimination of single points of failure is an essential strategy employed to increase availability.  Within ESMS, this is supported in depth.  ESMS’s architecture divides data distribution (distribution of database instances) and applications.  Each in their own way implement high levels of redundancy.  Various replication strategies are appropriate for database redundancy while duplication, possible multiple duplications, of functions are used for applications. 

The standard for duplication is that the replicated elements must be locatable in different physical locations anywhere in the world.  Our design standards specify practices that make distributed redundancy achievable at any level.  We don’t count running in VM or some similar environment as true redundancy since there several risk events that would invalidate such an approach.  ESMS redundancy also deals with what we call partitioning events where a system is separated into two or more now non-communicating segments.  This requires a parallel operation capability and automatic synchronization upon restoral.  Although this lesser redundancy model is supported, true full redundancy requires fully separated systems at separate locations.  By fully encapsulating functions and making them completely capable of sustaining their own operation plus making them self aware of their role and their environment, applications have many means of recovering from loss of a critical connection in addition to maintaining independent operations as is the case with network partitioning events.

The diagram at the right illustrated how distributed ESMS capabilities may be paired within each level to provide layer redundancy.  Note that layers are generally distributed and redundant elements should never be located in the same operating environment (computer, network segment, physical environment, etc.). 

Each layer application is aware of all available communications paths (as allowed by administrators) and knows how to navigate to the services it needs to maintain a fully operating solution.  EnterpriseSMS® is a powerful solution that is capable of any level of availability.