Austria’s most-visited website and the Software Load Balancer

Surely, running the most-visited website in Austria on a software load balancer is impossible? No, with HAProxy it’s totally do-able.

Background

Websites of a certain size require their loads to be distributed between multiple server instances. Incoming requests are spread via a load balancer – the most critical part of the network infrastructure.

The best known hardware products for this are probably the BIG-IP F5 series. Unfortunately, if (like us!) you’re constantly changing pool members, creating new pools, managing IP quotas and kicking unwanted or abusive traffic from your servers, then load balancer hardware solutions just don’t have the necessary flexibility.

Plus, switching entirely from HTTP to HTTPS will make performance an issue, too. Sigh.

At Willhaben.at we decided to implement our own solution. Mostly we were looking for open source, speed, and scalability, so we chose HAProxy and haven’t once regretted it.

Installation

Begin with your favourite Linux flavour and install HAProxy. We used Red Hat 7.1 on an “old” Server with two Intel Xeon X5560@2.8 GHz, resulting in 16 logical cores. Our discs were ordinary HDDs and the server had 6×1-Gigabit Network cards on board.

Once installed, we changed the HAProxy configuration to something like this:

Congratulations! You’ve just created your very own high-performance load balancer, as used at willhaben.at! Our story, however, didn’t end there – we had some boring installation procedures to tackle before getting productive with our solution.

But let’s skip to the interesting part…

Traffic Volume

will1Obviously a 1GBit link was insufficient, so we chose 10GBit interfaces.

Do you know how ridiculously expensive 10GBit interfaces are?

And do you know how ridiculously expensive FOUR 10GBit interfaces are, including a fiber-linked switch behind them?!

So, no 10GBit solution.

On a typical day, our platform dishes out 800-900 megabits per second, not counting traffic generated by mobile devices. On good days, we scratch the 1Gb/s mark.

Instead, we use four of our six 1Gb interfaces to create a 802.3ad conform network bond. The configuration from /etc/sysconfig/network-scripts/ifcfg-bond0 looks like this:

DEVICE=bond0
NAME=bond0
TYPE=Ethernet
BONDING_MASTER=yes
….
BONDING_OPTS=”miimon=100 mode=4 xmit_hash_policy=2″

That last line is the key part, where mode 4 (802.3ad) is defined. This line also defines the algorithm that decides which interface to use for outgoing traffic. The xmit_hash_policy creates a hash for a connection and divides it modulo the interface number (4 in our case).

Per default, the fastest hash type is used and that is layer 2 only. We had to switch this to layer 2+3 because MAC addresses on VLANs do not change. We would always get the MAC of the switch/router and therefore end on the same interface.

The same setting also needs to be done on the switch for incoming traffic.

With this setup we achieve both – network redundancy and decent traffic distribution:

What? More than 1Gb/s already?! Nice catch. But note that this is overall traffic.

When the balancer uploads an image to the client, for example, it needs to download it from the server first. And since all interfaces are full duplex, a byte can be sent and received at the same time.

All right. First problem solved. We have enough bandwidth, yay!

Redundancy

Now we have a high-bandwidth load balancer, we don’t want it to bottleneck the whole platform, we need redundancy.

Basically, there are two solutions to achieve this:

  1. Two active nodes: needs a public IP address for each node and broadcast both via DNS records. The advantage is the additional balancing; the disadvantage is the need to consider client-side mechanics when debugging a problem.
  2. One active node with hot standby: simply clone the existing machine and if the master fails, the slave takes over.

We decided on solution 2, with a master/slave setup. When designing a load balancer another consideration is guaranteed normal operation in case of a failure. In other words, if one server breaks, the solution can still handle all the traffic.

We implemented the master/slave design with keepaliveD. This deamon uses vrrdp as a protocol to syncronise IP addresses between hosts. If the master goes down, all IPs are simply transferred to the slave.

What we have now is something like this:

will2

 

The keepalived configuration is pretty straight-forward:


In a nutshell: all IP addresses from the virtual_ipaddress section are transferred to the slave if the scriptcheck_integrity.sh fails.

The only thing left to do is add net.ipv4.ip_nonlocal_bind=1 to/etc/sysctl.conf. It allows a service to bind to a socket, which consists of an IP+Port combination that does not yet exist. This is necessary to start the slave without its assigned IPs.

Another job well done. Redundancy: check! 🙂

Connection Management

The last issue to tackle is connection management.

When a client communicates with a server via TCP, the client uses its IP address and a (random) source port to connect to the server IP address on it’s destination port (e.g. 80 for http). Since each quadruple can only be used once, there is a limit on how many connections are possible.

Let’s consider incoming connections to the load balancer.

The load balancer’s destination IP (www.willhaben.at) and port (80) are fixed. The client IP is also fixed. That leaves the client’s port range to be modified. Since these numbers are 16-bit fields, the maximum number of connections a client can produce are 216 or 64ki.

That’s more than enough for a single machine.

Internally, however, these requests need to be forwarded to the appropriate server. Now that the load balancer acts as the client, the same 64ki limit applies. The only difference is that now the limit applies to all connections. When this limit is reached, it is called port exhaustion, something we want to avoid desperately.
load balancer
We do this by splitting our balancer processes in two parts:

  1. The actual frontend, where IPs are bound and SSL offloading happens
  2. Multiple backend processes, where the pools are managed and requests are forwarded to the actual server.

Both components are connected via the built-in proxy protocol v2. Taking the example from above, the frontend configuration now looks like this:


Depending on the number of connections, we can just spawn additional backend processes.

The separate frontend also bears the advantage of separating SSL offloading from the backend connections. You may have noticed the additional bind on port 443. This means all https connections on this IP are offloaded by the frontend as well. The result is a normal http connection to one of the backends.

Third time’s the charm and that’s essentially it.

Summary

The three sections above are a very coarse-grained overview of what we actually did (we fought multiple sub-problems before finding a really reliable solution). However, the details are out of scope and not very interesting for most people.

To wrap it up, here are some numbers you might find interesting.

  • We are running the frontend on one process spread to eight cores
  • All our traffic is SSL encrypted and offloaded by HAProxy
  • For 1Gbit/second of SSL traffic, the frontend cores are 60% loaded on average
  • We use two backend processes for server connections
  • Each backend process has four localhost-binds (yes, we didn’t mention that trick, sorry)
  • We have around 10k-20k connections per second
  • We have around 150k established connections between frontend and backend
  • We manage 120k established client connections on peak times
  • We manage around 40k concurrent SSL connections
  • We have around 600 SSL handshakes per second
  • We use http keep-alive for clients and frontend-backend connections
  • We use http-server-close on the server side, and a connection pool to restrict concurrency.

So, yes, you can absolutely replace a hardware load balancer with a software solution! All you need are some old servers, some time and some guys that know their TCP.

This article was first published in Willhaben.at´s tech blog

More to read about Software engineering
SUBSCRIBE TO OUR UPDATES
Menu