Health Check Setup
A simple web server setup consists of a router and a web server like the figure below.
When the server is overloaded, the response will be very slow and if the web server is down for any reason, the entire web site becomes unavailable.
The solution is to use a WebMux setup as shown by the animated figure below. A WebMux is controlling a group of web servers called a web array or web farm. To the outside world the entire web farm acts as one web server.
Operation
- During normal operation, WebMux directs the web access load according to the configurations.
- If one of the web servers within the farm is down for any reason, WebMux will bypass the failed server and page the operator. The load will be shared with the remaining servers. As long as at least one server within the farm is up and running, the web site is operational to the outside world.
- When the failed server is recovered, WebMux detects its presence automatically. It will share the workload again with the rest of the farm.
- If false alarms occur and the WebMux marks servers dead incorrectly, try the following troubleshooting steps:
- First, if there is any Web Application Firewall or other security processes that block repetitive queries (such as what WebMux health checks would appear as), we would suggest that you whitelist the WebMux IP addresses so those queries will not be subject to any filtering.
- Second, if you have not done so already, check your server logs that correlate to the times when the WebMux reports the server unreachable. There may be some event or server reason that is actually legitimately causing the servers not to respond the WebMux health checks.
- Third, if you know for sure that the servers are not unresponsive during those times, you can try increasing the health check timeout value for the service. In the WebMux GUI under Health -> Timeouts, you can try increasing the timeout value of the HTTPS service until you find the value that does not trigger false positives.
A Secondary WebMux
The figure below shows a configuration of two WebMux units.
- One of the WebMux units is setup as primary and the other secondary. During normal operation, the primary is directing the web request traffic.
- The secondary checks the primary periodically. If the primary goes down and does not respond, the secondary takes over.
- Even after the fail over, the secondary continues to check the primary. Should the primary recover, the secondary relinquishes control back to the primary.