5 Sep 2016
How we upgraded Kafka
This is a story about upgrading Kafka from 0.8.2.0 to 0.9.0.1 and what to look out for.
20 May 2019 • Rédigé par Arnaud Bawol
As you know, we designed a multihoming point of peering to secure our nines and ensure a reliable service for our customers.
The following is a run-down on the modifications we made to ensure the widest bandwidth on a reliable design.
The basic principle of the previously described architecture was to abstract transportation between two hosting providers' networks. We relied on BGP to route packets on this peering link and it seemed to be a good idea, but BGP isn't aware at all of the other routers load and doesn't implement any load balancing logic.
Quite often our traffic repartition was looking like this:
As you can see, load-sharing is not load-balancing.
BGP choose a channel to send its packets, and stuck to it forever. It can handle a network failure on both sides, but when it comes to network performance, you are going to feel left behind.
Let's talk about solutions.
Since we already had a 4-HOP interconnection, it didn't hurt to shake its components a bit. We first chose pfsense for its simplicity, but mostly for the incredible performances of packet-filter.
Despite Pfsense's ability to be highly available and highly reliable, it lacks a key feature for the design we had in mind. Indeed, it's able to handle load-balancing on an ingress link, but egress is another topic. As a matter of fact, it appeared impossible to do roundrobin on an uplink with Pfsense.
Since we were already using a bsd-based distro, why not check FreeBSD? It also has CARP redundancy, arp proxying and packet-filter. The lack of UI offers the possibility to use all the non-essential features that PFSense left aside.
To those who already know what feature I'm talking about, here is the configuration that we used:
As you might guess, the most important line here is:
Since some BSD distros have their own
pf port, it's important to keep in mind that this is the FreeBSD syntax, but the feature also exists in OpenBSD AFAIK.
This way, packetfilter does load balancing by TCP session. So, when your connection is established, you will keep the same transport route until its end.
ARP proxying is quite simple, you just have to follow FreeBSD's guidelines to add your custom
ARP entries to the table. We will see a bit later how to update the
ARP table when a router becomes unavailable.
This is where it gets interesting. Now that we have a router with load-balancing capabilities, we don't want to see it fail. Using CARP, it's relatively straightforward:
You can check FreeBSD's documentation if this feature is still blurry for you. With this configuration, routers will share a virtual IP address. If a router is seen as down, it will be discarded from the cluster and its peer will handle the rest.
The same config being applied on the backup router, you may want to check the
advskew option to ensure that there is some consistency on your failover.
There is now a routing failover mechanism, but we also need to load/unload our ARP entries from the table to enable/disable proxying.
FreeBSD has a device state change daemon called devd. This daemon enables you to run some commands on device state changes. It's configured via a simple config file in
action scripts are called on
MASTER type events, there are a few others that are well documented.
Since we are able to "hot swap" our routers, we also need to synchronize states. We previously saw that
keep state was used in our
pf rule. Those states are kept locally, but we can synchronize them on the backup router via pfsync which is note worthy of a whole section.
Well, it looks like this:
We kept using BGP,as previously described, between our VPN servers, to ensure failover on a VPN Link. Now, our routers are available of their uplink availability, our loadbalancers are able to spread packets around;
TCP is able to handle sequences re-ordering in sessions.
Yes, it is.
So far, we've peaked for several hours @800Mbps+ without breaking a sweat, latency is not impacted by the addition of a hop since the RTT mostly depends on the distance between our two DCs. Our uplinks are properly used, there is no more ARP flapping between hosts and we have a rather stable latency between our datacenters.
Fresh news on modern CRM in your inbox !