Get rid of bufferbloat with SQM

Let’s start from a bit far…
TCP/IP works so well, since decades ago, with links in the order of kbits/s, until nowadays with 10Gbits/s and more. Of course there’s been some modifications since the beginnings, but the concept is the same, and almost unchanged. The fact is there’s no way for an endpoint to know the speed it can reach (the bandwidth) in the path to the other endpoint. You can have a gigabit adapter, the server you connect to can be a 10 gigs adapter, but what’s in between? You simply don’t know. So how fast can you send packets without issue?
Let’s see the issue from another point of view: there’s a router with 3 gigabit interfaces, traffic can go in all directions, what happens if traffic is incoming from 2 interfaces and want to go out on the same 3rd interface? How can the router handle 2 Gbits/s on a single Gbits/s interface? The answer of course is simple: it can’t. In some way the router must tell the devices to slow down.
The way TCP/IP handle this is dropping the packets, the device will recognize a packet has been dropped since no ACK is returned, this is symptom of congestion in the network. Explicit Congestion Notification (ECN) is a “new” way to handle this, but not so much used unfortunately…
Now, congestion is not an issue if limited to like 1% or 2% of dropped packets, it’s just the way TCP/IP works. It start sending 1 packet and wait for the ACK, then send 2 packets, then 3, and so on, when a drop is detected it halve the number of packets, and then go on again adding 1 packet for each successful transmission. This is the old “Additive Increase, Multiplicative Decrease” mechanism used by Reno, there are several modern and better way to achieve the same result, but the concept is the same, it’s called congestion avoidance. Linux usually use Cubic algorithm, AIX use New Reno, new Windows releases also use Cubic by default.
Another issue that add even more entropy is that TCP/IP traffic is “self regulated”, meaning each device is supposed to be fair with other devices, that’s why DoS are so easy to achieve when you are inside a network . Also having clients with different congestion algorithms may bring issues, since some are more aggressive and some are more fair, AIX usually lose “against” Linux, but if everyone was fair like AIX I can assure you network congestion would be less of an issue.
UDP is another thing, congestion has to be handled at application layer, since UDP is stateless and it’s not aware of packet loss.

Anyway, for some reason network admins hate packets drops (probably the managers are, as always, the problem: “hey our network have issues, we are experiencing packet losses”), and a way to limit it is to provide the network devices with BUFFERS.
Buffers are great to handle the peaks, instead of dropping the packet let’s queue it in the buffer, eventually congestion go away and the device will be able to transmit it with no drops. Wonderful, isn’t it?

NO! (fuck sake)

Buffers have become way too big, and of course bandwidth will be higher, but there’s a thing called LATENCY. Do you really want to have an RTT of 100ms instead of 10ms just to gain a 10% of bandwidth? Maybe, in some cases, like if you are downloading a movie. Gamers will not agree, they want a fast ping. Several applications need low RTT, VOIP for example.
A solution is Quality of Service (QoS), network devices can be configured to give more priority to certain streams. But this could be complicated to handle.
Smart Queue Management (SQM) use a different approach. example of SQM is FQ-CoDel, used by default on latest Linux distros, and its successor: CAKE (Common Applications Kept Enhanced).
I don’t want to go into details since it’s far beyond my knowledge, I will put some links at the bottom of this post.

In Linux you can check what is in place with sysctl:

root@www:~# sysctl net.core.default_qdisc
 net.core.default_qdisc = fq_codel

 root@www:~# sysctl -a | grep congestion_control
 net.ipv4.tcp_allowed_congestion_control = reno cubic bbr
 net.ipv4.tcp_available_congestion_control = reno cubic bbr
 net.ipv4.tcp_congestion_control = bbr

In Windows:

> netsh int tcp show supplemental

The TCP global default template is internet

TCP Supplemental Parameters
------------------------------------------------------
Minimum RTO (msec)                  : 300
Initial Congestion Window (MSS)     : 10
Congestion Control Provider         : cubic
Enable Congestion Window Restart    : disabled
Delayed ACK timeout (msec)          : 40
Delayed ACK frequency               : 2

Let’s see how we can get rid of this issue in a home environment, you will need a router supporting OpenWRT since the majority of home routers don’t have such kind of features in their firmware. There are alternatives to OpenWRT, just google it.
My router here is a Linksys WRT3200ACM, which it officially support openWRT, and even have dual boot feature.

You see I selected CAKE and limited to like 90% of my max bandwidth, you should do several test to find your optimal values. 95% is probably a good solution. CAKE by itself will help with the so called bufferbloat problem, so you could insert 0 (unlimited), but since your local network is most probably faster than your WAN, inserting a limit at least for the upload will be a great help. You also don’t know how your ISP handle the buffers, limiting a bit the bandwidth will improve your latency A LOT.
I inserted 18 Bytes as packet overhead, 14 for the ethernet header and 4 for the VLAN TAG my modem use. Probably I should add 4 more Bytes for Frame Check Sequence (FCS), I have to chek this. If you are using PPPoE you will need to add 8 Bytes more (not 100% sure, too lazy to check now).

I have a Sagemcom F@ST 5355 modem/router, which I only use as a modem, branded and limited as fuck by Sunrise (my ISP), that’s bad since the hardware is good and the original firmware is a fork of OpenWRT…
I was able to gain root access in some way, but I don’t want to modify it too much, my ISP could not like it 🙂
But at least I found out some useful information on my connection. I am using G.FAST with 106MHz (it’s a sort of successor of VDSL2) over standard copper twister pair (probably not even twisted LOL) and not using PPPoE, I am using “direct” DHCP instead (no user / password provided).

References:
https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/
https://www.coverfire.com/articles/queueing-in-the-linux-network-stack/
https://www.bufferbloat.net/projects/
https://www.bufferbloat.net/projects/codel/wiki/Cake/

2 thoughts on “Get rid of bufferbloat with SQM”

Leave a Reply Cancel reply