| |||||||||||||||||||||||||||||||||
You can enter MTBF and MTTR for 2 system components in the calculator above, from which the reliability of arbitrarily complex systems can be determined. MTBF values are usually provided by hardware manufacturers and MTTR will be determined by the processes you have in place for your system. For example, if one had a motherboard MTBF of 50000 hours, then adding a hard disk with an MTBF of 20000 hours will give a combined (or series) MTBF for the system of 14286 hours. Note a "failure" may not be unexpected, and could be planned, like in the case of a software upgrade for example. If someone wants "5 nines" reliability then they need the service to be available 99.999% of the time, I.E an availability of 0.99999 which corresponds to the service on average to be unavailable for 5 minutes in every year. "High availability" refers to the processes involved in maximising this availability. Continuing the example above, if one can replace the hard disk and motherboard within 2 and 5 hours respectively, then one can expect on average a system availability of 99.98% which corresponds to a downtime of around 1 hour 45 minutes per year. A common method for increasing availability of a system is to have redundant components in parallel, either of which can keep the system running while the failed one is replaced. Note the parallel MTBF value above represents when repairs are not made at all. A common example of redundant components in parallel is RAID for hard disks. Taking the above example again, we can see that a single hard disk has 4 "nines" availability, while just 2 in parallel in a RAID 1 configuration have an availability of 8 "nines". Note RAID is not enough to ensure the integrity of data on hard disks, and additional data duplication (backup) is required to protect against the more common non disk specific issues. |
© Oct 22 2007