I am surprised to hear that so many people are not using all available methods to load balance and create redundancy in their E1 systems. This is not the first time I have read someone mention that Network Deployment is not a good solution. I think that ND is the only really feasible application-level load balancing and failover device available. I try to introduce redundancy into my system at every level in the following ways:
1) Network- Accomplished using either DNS round-robin, NLB, or a hardware device.
I haven't worked with NLB yet but apparently it offers improvements over DNS like fault tolerance and intelligent load balancing.
- DNS will continue to send network traffic to a down server and is rather dumb when balancing.
Note: DNS round-robin will recognize when just an HTTP server is down and route to the other HTTP server
- NLB will stop sending diners to the dead waiter's table : - )
- NLB is also supposedly session sensitive, meaning that it will continue to send packets to the server holding the active session.
Another possibility is using IBM's Edge Components that are included with the Tech Foundation.
If you have a lot of money a hardware load balancing device is the way to go
2) HTTP - Multiple HTTP servers (Apache or IIS) preferably on separate servers from WebSphere.
It is not always economically feasible to separate these from WAS but to ensure high availability
at the HTTP level one must have more than one HTTP server, fronted by a method mentioned above balancing the load to these servers.
3) WebSphere - On WAS 4 we cloned software JAS servers in server groups (Vertical and Horizontal) to ensure high availability at the WAS level. In WAS5 and WAS6 all we do is create a cluster of JAS servers and manage them with Network Deployment. There are a couple of items that need to be done with ND but this works and the load balancing works. Not sure why we would not take advantage of this and heck, it is easy to create a cluster. The use of a vertical cluster ensures efficient utilization of memory on the box and several small JVM's have much more efficient garbage collection than one large JVM. I am not sure how others are scaling their web systems without using WAS clustering. If you use a single instance and increase the JVM heap size to accommodate additional users the system will pause too long during garbage collection. I create 3 or 4 instances using clustering and give each one a 512 or 768 MB heap size.
As far as the comment about how NLB load balances, if you have sticky sessions configured you should get the same server as last time. NLB cannot determine that a JAS session is not responding and will continue to send network traffic to that box. If the server (hardware, OS, NIC, etc.) stops responding then and only then will NLB redirect. You have to count on WAS clustering (using Network Deployment) to direct away from failed JAS instances.
You have to use the right tools for the right job- NLB will redirect away from failed hardware, the HTTP WAS plug-in will redirect away from failed WebSphere servers, WAS clustering will redirect away from failed JAS instances.