Fault Tolerance

Fault Tolerance is refers to a system’s ability to allow for failures or malfunctions, and this ability may be provided by software, hardware or a combination of both. It is a setup or configuration that prevent a computer or network device from failing in the event of an unexpected problem or error. Fault Tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and, in general, aiming for self-stabilization so that the system converges towards an error-free state. However, if the consequences of a system failure are catastrophic, or the cost of making it sufficiently reliable is very high, a better solution may be to use some form of duplication.