Project

General

Profile

Issue #3033

tcpmss and pmtud

Added by Tom Hsiung over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
configuration
Affected version:
5.6.2
Resolution:
No change required

Description

Hello, sir

I have some concerns.

1) The tcpmss rules in https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling

iptables -t mangle -A FORWARD -m policy --pol ipsec --dir in -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1361:1536 -j TCPMSS --set-mss 1360
iptables -t mangle -A FORWARD -m policy --pol ipsec --dir out -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1361:1536 -j TCPMSS --set-mss 1360

So does these two rules only affect decapsulated packets derived from IPsec packets which reach the IPsec gateway in the form of IPsec UDP-encapsulated packets? Or it affect all packets including the original packets that are not encrypted?

IMG_0266.JPG (519 KB) IMG_0266.JPG Tom Hsiung, 06.05.2019 11:32

History

#1 Updated by Tom Hsiung over 6 years ago

2) What the purpose of the sysctl setting:

net.ipv4.ip_no_pmtu_disc=1

We already have set up the tcpmss rules. Why in addition do we have to set the sysctl parameter to 1?

Thank you.

Tom

#2 Updated by Tobias Brunner over 6 years ago

  • Status changed from New to Feedback

So does these two rules only affect decapsulated packets derived from IPsec packets which reach the IPsec gateway in the form of IPsec UDP-encapsulated packets? Or it affect all packets including the original packets that are not encrypted?

The former (as can be seen by the policy match, see iptables-extensions(8) man page).

We already have set up the tcpmss rules. Why in addition do we have to set the sysctl parameter to 1?

It disables PMTUD (see ip-sysctl.txt) e.g. if it's broken with certain servers (whether it's necessary or even has an effect depends on the actual scenario and protocols used e.g. UDP).

#3 Updated by Tom Hsiung over 6 years ago

Tobias Brunner wrote:

So does these two rules only affect decapsulated packets derived from IPsec packets which reach the IPsec gateway in the form of IPsec UDP-encapsulated packets? Or it affect all packets including the original packets that are not encrypted?

The former (as can be seen by the policy match, see iptables-extensions(8) man page).

We already have set up the tcpmss rules. Why in addition do we have to set the sysctl parameter to 1?

It disables PMTUD (see ip-sysctl.txt) e.g. if it's broken with certain servers (whether it's necessary or even has an effect depends on the actual scenario and protocols used e.g. UDP).

So both

net.ipv4.ip_no_pmtu_disc=1
net.ipv4.ip_no_pmtu_disc=0

will set the DF bit to 1. As a result, the outgoing packets from the gateway will not be fragmented if they oversize the path MTU.

Tom

#4 Updated by Tobias Brunner over 6 years ago

As a result, the outgoing packets from the gateway will not be fragmented if they oversize the path MTU.

No, by setting it to 1 the kernel just ignores the MTU reported in ICMP Fragmentation Needed messages (see RFC 1191) and instead always uses the configured minimum MTU (net.ipv4.min_pmtu) for such paths. It's necessary if the reported MTU is incorrect or won't work for other reasons.

#5 Updated by Tom Hsiung over 6 years ago

net.ipv4.min_pmtu

I think it should be

net.ipv4.route.min_pmtu

Thanks.

by setting it to 1 the kernel just ignores the MTU reported in ICMP Fragmentation Needed messages

So this means that the router reporting the Fragmentation Needs will not fragment the oversized packets and feedback a ICMP to the source host.

Tom

#6 Updated by Tobias Brunner over 6 years ago

I think it should be
[...]

Yes, looks like the kernel documentation is wrong (should be route/min_pmtu there).

So this means that the router reporting the Fragmentation Needs will not fragment the oversized packets and feedback a ICMP to the source host.

Only if the DF bit is set. But that's what it does unless such ICMPs are blocked.

#7 Updated by Tom Hsiung over 6 years ago

And what is the difference between:

net.ipv4.ip_no_pmtu_disc=1

and iptables rule

... TCPMSS --clamp-mss-to-pmtu

Thank you

#8 Updated by Tobias Brunner over 6 years ago

What do you mean? Other than the relation to the PMTU they are totally unrelated.

#9 Updated by Tom Hsiung over 6 years ago

Yep. They are two different mechanisms. I just want to know what are the two for. I google the Internet but no useful information.

Now, the former is clear, where the DF bit in the IP header could be set to 1 to disable the IP fragmentation function of the path to feedback a path minimal MTU to the source host.

But what does the latter option do? It use some mechanism to set a tcpmss, but based on what? What is the function and rationale under the hood of --clamp-,ss-to-pmtu? I googled some results saying --clamp-mss-to-pmtu sets the tcpmss based on the MTU of that machine.

Tom

#10 Updated by Tobias Brunner over 6 years ago

I googled some results saying --clamp-mss-to-pmtu sets the tcpmss based on the MTU of that machine.

Exactly, the PMTU determined for each packet (if PMTUD is disabled, it's either the MTU of the outbound interface/route, or perhaps just the minimum MTU).

#11 Updated by Tom Hsiung over 6 years ago

So, the --clamp-mss-to-pmtu function does not involve the discovery of path minimal MTU? --clamp-mss-to-pmtu only set the packet MTU the value of the machine which has this iptables --clamp-mss-to-pmtu rule.

For example, if the MTU of the machine is 1500, and after it til the final destination, there is another machine which has a MTU of 1400 without iptables --clamp-mss-to-pmtu rule, the --clamp-mss-to-pmtu rule will fail to work because it sets the tcpmss based on the MTU of 1500, from the first machine. But there is the MTU of 1400 for a second machine, so it is not able to avoid the IP fragmentation.

Tom

#12 Updated by Tobias Brunner over 6 years ago

So, the --clamp-mss-to-pmtu function does not involve the discovery of path minimal MTU?

No.

--clamp-mss-to-pmtu only set the packet MTU the value of the machine which has this iptables --clamp-mss-to-pmtu rule.

It sets the MSS in the options field of the TCP header of matching packets to the known PMTU instead of a static value.

For example, if the MTU of the machine is 1500, and after it til the final destination, there is another machine which has a MTU of 1400 without iptables --clamp-mss-to-pmtu rule, the --clamp-mss-to-pmtu rule will fail to work because it sets the tcpmss based on the MTU of 1500, from the first machine. But there is the MTU of 1400 for a second machine, so it is not able to avoid the IP fragmentation.

Correct, this only works if PMTUD is enabled and working (or the PMTU is lowered by some other means, e.g. route/interface). Otherwise, the iptables rule has to configure a lower value manually.

#13 Updated by Tom Hsiung over 6 years ago

And here I use ifconfig to show the virtual network interface on my client.

ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1400
        inet 10.2.13.2 --> 10.2.13.2 netmask 0xff000000 

Why the MTU of the ipsec0 is 1400 instead of 1500? So under the case that the virtual network interface's MTU is 1400, do we still need the iptables rules for tcpmss configuration (i.e., -j TCPMSS --set-mss 1360)?

Tom

#14 Updated by Tom Hsiung over 6 years ago

It sets the MSS in the options field of the TCP header of matching packets to the known PMTU instead of a static value.

Oh! Sound like the net.ipv4.ip_no_pmtu_disc=0 and -clamp-mss-to-pmtu negotiate with each other to make the rational of path MTU discovery works.

net.ipv4.ip_no_pmtu_disc=0

This option is aimed at discovery the minimal MTU along the whole path and feedback the value of that minimal MTU.
-clamp-mss-to-pmtu

This option tells the gateway to change the value of tcpmss to that feedback minimal MTU.

And if

net.ipv4.ip_no_pmtu_disc=1

The --clamp-mss-to-pmtu is overlooked and the kernel use the value set in parameter net.ipv4.route.min_pmtu to modify the final packet size.

Tom

#15 Updated by Tom Hsiung over 6 years ago

And, could you please interrupt the PMTUD process? I drew a draft to show the basic process.

Step 1 - The source sends out packets to destination with the original MTU of 1500 and DF bit of 1 (via net.ipv4.ip_no_pmtu_disc=0).
Step 2 - These packets of 1500 MTU engage with router 2, and because the downstream MTU of router 2 is less than 1500 bytes and because the DF bit of these packets are set to 1, so no IP fragmentation happens, and packets are dropped and ICMP echos with the value of the router 2 downstream MTU are feedback to source.

Step 3 - The source must has some mechanism to identify the feedback MTU and adjust its later packets' MTU.

I want to ask how?

Because the --clamp-mss-to-pmtu rule is set on the router 1, so how does it interact with the ICMP feedback ICMP packets to adjust later (forwarded) packets' size? I mean, the echo ICMP feedback sent from router 2 is targeted at the source host, not the router 1, so I how the router 1 know the path minimal MTU and apply its --clamp-mss-to-pmtu rule to later packets.

Thank you.

#16 Updated by Tobias Brunner over 6 years ago

I mean, the echo ICMP feedback sent from router 2 is targeted at the source host, not the router 1, so I how the router 1 know the path minimal MTU and apply its --clamp-mss-to-pmtu rule to later packets.

It probably doesn't at all. PMTUD is something the end-node does, the router/gateway is not directly involved (it only forwards the ICMPs). So using that option on the server is probably useless unless the gateway itself sends packets to an affected host.

#17 Updated by Tom Hsiung over 6 years ago

Thanks.

And os the

TCPMSS --set-mss 1360

is very necessary for stongSwan. Because, if we don't set the tcpmss manually, the PMTUD technology would result in a original packet within the range of the path minimal MTU. However, our original packet would later be encapsulated by another IP header in additional the NAT-Traversal UDP header, the ESP header and tailer, which could result in a bigger packet size than the negotiated MTU, and IP fragmentation happens.

After my testing, the strongSwan process would take up 100 bytes additional bytes on the basis of the original IP packet. So, if the path minimal MTU is 1500 bytes, you have to use the TCPMSS --set-mss option to lower the packet size to a degree (1400 bytes), so that the encapsulated new packet has a final size of right 1500 bytes to avoid IP fragmentation.

But this is for TCP packets. How about UDP packets? TCPMSS --set-mss changes the negotiation TCP packet size between two communication end points by affect the TCP shaking hand process. How about UDP packets?

Tom

#18 Updated by Tobias Brunner over 6 years ago

Because, if we don't set the tcpmss manually, the PMTUD technology would result in a original packet within the range of the path minimal MTU. However, our original packet would later be encapsulated by another IP header in additional the NAT-Traversal UDP header, the ESP header and tailer, which could result in a bigger packet size than the negotiated MTU, and IP fragmentation happens.

Linux does account for that, i.e. PMTUD works with IPsec, unless ICMPs are blocked by some firewall/router. But that might not be the case for other clients.

How about UDP packets?

This might require setting the PMTU manually (via interface/route). Where depends on the location of the MTU issue and where exactly PMTUD is blocked.

#19 Updated by Tom Hsiung over 6 years ago

Tobias Brunner wrote:

It probably doesn't at all. PMTUD is something the end-node does, the router/gateway is not directly involved (it only forwards the ICMPs). So using that option on the server is probably useless unless the gateway itself sends packets to an affected host.

OK. In this case, for example, if the router 1 has some softwares which send packets to the destination, then the PMTUD and --clamp-mss-to-pmtu would work, right?

If the router 1 sends these packets, does it require to set up both PMTUD and --clamp-mss-to-pmtu to make the things work? Or only PMTUD is enough?

My pppoeconf script will add this rule into my iptables rule list by default. I noticed that the rule is in mangle table FORWARD chain.

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         
1    TCPMSS     tcp  --  anywhere             anywhere             tcp flags:SYN,RST/SYN tcpmss match 1400:65495 TCPMSS clamp to PMTU

It is not a rule in the OUTPUT chain, but the FORWARD chain which deals with FORWARDED traffic that are not generated by the gateway.

Tom

#20 Updated by Tom Hsiung over 6 years ago

Tobias Brunner wrote:

Linux does account for that, i.e. PMTUD works with IPsec, unless ICMPs are blocked by some firewall/router. But that might not be the case for other clients.

So I guess my macOS and iOS clients does not account for that. After setting the tcmpmss manually, the latency of loading pictures and video has improved greatly.

Tom

#21 Updated by Tom Hsiung over 6 years ago

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         
1    TCPMSS     tcp  --  anywhere             anywhere             tcp flags:SYN,RST/SYN tcpmss match 1400:65495 TCPMSS clamp to PMTU

Oh, I guess it works like this:
This rule (on the gateway machine) would change the TCPMSS value in the shaking hand packets that handled (forwarded) by the gateway machine based on the interface MTU value of the gateway machine. For, example, the gateway could has a upstream MTU of 1500, and a downstream MTU of 1400, or less. And in the case, the --clamp-mss-to-pmtu rule in the FOWARD chain would change the TCPMSS value in the sharing hand packets of these forwarded traffic based of that downstream MTU of 1400 (or less).

Thank you for your kind help!

Tom

#22 Updated by Tobias Brunner over 6 years ago

OK. In this case, for example, if the router 1 has some softwares which send packets to the destination, then the PMTUD and --clamp-mss-to-pmtu would work, right?

I suppose, but don't have any experience.

If the router 1 sends these packets, does it require to set up both PMTUD and --clamp-mss-to-pmtu to make the things work? Or only PMTUD is enough?

If PMTUD to/on the clients doesn't work, enabling only PMTUD on router 1 would only fixes things for traffic that originates from there.

So I guess my macOS and iOS clients does not account for that.

Maybe, but it's also possible that PMTUD simply doesn't work because ICMPs don't reach the clients.

#23 Updated by Tom Hsiung over 6 years ago

And the TCP iptables rules at the gateway is not useful because the encapsulated packets are in the form of UDP (forced NAT-T enabled), and the original packet is encapsulated by the client end and IPsec server end. The TCP iptables rules at the gateway cannot do anything.

I guess the reason pppoeconf script added this TCP --clamp rule is to adjust the regular passing by TCP packets' size.

#24 Updated by Tobias Brunner over 6 years ago

And the TCP iptables rules at the gateway is not useful because the encapsulated packets are in the form of UDP (forced NAT-T enabled), and the original packet is encapsulated by the client end and IPsec server end. The TCP iptables rules at the gateway cannot do anything.

Nope, the rules very much affect the TCP streams tunneled via IPsec after decryption/before encryption at the gateway.

#25 Updated by Tom Hsiung over 6 years ago

No, I did mention the IPsec gateway, Tobias. I mean my home's gateway to WAN. I know that TCP iptables rules on the IPsec server machine has effects on decryption packets. My clients generate encapsulated packets, these packets will pass through the home gateway, then they reach the IPsec gateway and are decrypted. Thanks.

Tom

#26 Updated by Tobias Brunner over 6 years ago

  • Category set to configuration
  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Resolution set to No change required