Issue #310
Problem with source IP selection in multihomed environments
Description
I currently have a IKEv1 transport connection setup on my server. Everything appears to work correctly, but seems to have a problem where the connection stalls when the server is trying to rekey.
The server has one public interface with multiple public IPs setup with aliases, like this:
eth0 9.0.0.1 - holds the default route
eth0:1 9.0.0.2
eth0:2 9.0.0.3
eth0:3 9.0.0.4
Connection entries was setup for 9.0.0.2 and 9.0.0.3, and it works fine.
When a client establishes a connection on 9.0.0.2, the source IP address for the initial ISAKMP packets and ESP packets were correct (9.0.0.2). However, when the server tries to rekey with the client, the ISAKMP packets sent are no longer with the right source IP address (9.0.0.1 instead of 9.0.0.2), even though the log itself said it generated a packet with the source of 9.0.0.2 (not the case as seen from tcpdump).
The log contains this when it happened:
Mar 11 20:45:46 03[ENC] generating QUICK_MODE request 334791759 [ HASH SA No ID ID ]
Mar 11 20:45:46 03[NET] sending packet: from 9.0.0.2500 to 8.0.0.100500 (220 bytes)
Mar 11 20:45:46 11[NET] received packet: from 8.0.0.100500 to 9.0.0.2500 (76 bytes)
Mar 11 20:45:46 11[ENC] parsed INFORMATIONAL_V1 request 152048315 [ HASH N(INVAL_ID) ]
Mar 11 20:45:46 11[IKE] received INVALID_ID_INFORMATION error notify
Mar 11 20:45:48 07[NET] received packet: from 8.0.0.100500 to 9.0.0.2500 (220 bytes)
Mar 11 20:45:48 07[ENC] parsed QUICK_MODE request 3295316553 [ HASH SA No ID ID ]
Mar 11 20:45:48 07[IKE] no matching CHILD_SA config found
Mar 11 20:45:48 07[ENC] generating INFORMATIONAL_V1 request 2030090789 [ HASH N(INVAL_ID) ]
and the connection stalls during the rekeying process since it can never rekey properly. If the client managed to negotiate NAT-T during the initial negotiation, then rekeying isn't an issue since the ISAKMP packets were then encapsulated in the nat-t tunnel containing the correct source IP address.
Any way this can be resolved?
Related issues
History
#1 Updated by Martin Willi over 12 years ago
- Status changed from New to Feedback
Hi David,
03[ENC] generating QUICK_MODE request 334791759 [ HASH SA No ID ID ] 03[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (220 bytes) 11[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (76 bytes) 11[ENC] parsed INFORMATIONAL_V1 request 152048315 [ HASH N(INVAL_ID) ] 11[IKE] received INVALID_ID_INFORMATION error notify
Looks like the selection of the ID payloads is wrong when multiple paths to the peer are available.
I'm not sure if it caused this specific issue, but we have been a little to aggressive in updating peer addresses when using IKEv1 or non-MOBIKE IKEv2 connections. I've recently pushed a fix that might help with that problem as well. Please try the patch at:
http://git.strongswan.org/?p=strongswan.git;a=commitdiff;h=21dd4c4b
Regards
Martin
#2 Updated by Davidok ok over 12 years ago
Hey Martin.
Thanks for the suggestion.
I have tried the patch, but to no avail. One thing I noticed that it is quite difficult sometimes to actually get it to do ESP instead of NAT-T. Restarting the daemons a couple times may get it to do ESP, and the problem only occurs in straight ESP vs it being inside a NAT-T tunnel.
Perhaps you can see something interesting here?
# ipsec.conf - strongSwan IPsec configuration file config setup conn %default ikelifetime=60m keylife=30m rekeymargin=12m keyexchange=ikev1 fragmentation=yes authby=secret conn rw left=9.0.0.2 leftprotoport=tcp/445 type=transport right=%any auto=add
Connections: rw: 9.0.0.2...%any IKEv1 rw: local: [9.0.0.2] uses pre-shared key authentication rw: remote: uses pre-shared key authentication rw: child: dynamic[tcp/445] === dynamic TRANSPORT Security Associations (1 up, 0 connecting): rw[1]: ESTABLISHED 8 minutes ago, 9.0.0.2[9.0.0.2]...8.0.0.100[8.0.0.100] rw[1]: IKEv1 SPIs: 596cbbc85c858e62_i 8711a2471bc1a140_r*, pre-shared key reauthentication in 32 minutes rw[1]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048 rw{1}: REKEYING, TRANSPORT, expires in 21 minutes rw{1}: 9.0.0.2/32[tcp/445] === 8.0.0.100/32
11:40:38.859414 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa5), length 100 11:40:39.888854 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xad), length 100 11:40:44.853424 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa6), length 100 11:40:44.853720 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xae), length 100 11:40:49.073132 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xaf), length 100 11:40:50.860017 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa7), length 100 11:40:54.078826 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xb0), length 100 11:40:56.022805 IP 9.0.0.1.500 > 8.0.0.100.500: isakmp: phase 2/others ? oakley-quick[E] 11:40:56.024114 IP 8.0.0.100.500 > 9.0.0.1.500: isakmp: phase 2/others I inf[E] 11:40:56.852051 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa8), length 100 11:40:59.916228 IP 8.0.0.100.500 > 9.0.0.1.500: isakmp: phase 2/others I oakley-quick[E] 11:41:00.052425 IP 9.0.0.1.500 > 8.0.0.100.500: isakmp: phase 2/others ? inf[E] 11:41:02.855440 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa9), length 100 11:41:08.853759 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xaa), length 100 11:41:13.934110 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xab), length 100 11:41:18.919456 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xac), length 100 11:41:24.884587 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xad), length 100 11:41:29.850175 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xae), length 100
Mar 12 11:41:01 04[KNL] creating rekey job for ESP CHILD_SA with SPI cb4ce47a and reqid {1} Mar 12 11:41:01 09[ENC] generating QUICK_MODE request 257055226 [ HASH SA No ID ID ] Mar 12 11:41:01 09[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (220 bytes) Mar 12 11:41:01 12[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (76 bytes) Mar 12 11:41:01 12[ENC] parsed INFORMATIONAL_V1 request 1151708611 [ HASH N(INVAL_ID) ] Mar 12 11:41:01 12[IKE] received INVALID_ID_INFORMATION error notify Mar 12 11:41:05 07[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (220 bytes) Mar 12 11:41:05 07[ENC] parsed QUICK_MODE request 2568701466 [ HASH SA No ID ID ] Mar 12 11:41:05 07[IKE] no matching CHILD_SA config found Mar 12 11:41:05 07[ENC] generating INFORMATIONAL_V1 request 806514467 [ HASH N(INVAL_ID) ] Mar 12 11:41:05 07[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (76 bytes)
#3 Updated by Ole Husgaard over 12 years ago
I have exactly the same problem, in a very simple configuration. The problem seems constant, and easy to reproduce.
I try to connect two LANs with a tunnel between VPN endpoints A and B, using IKEv1.
Endpoint A is an old StrongSwan installation, using Pluto. This has public IP 8.0.0.1. This endpoint has a firewall that blocks all UDP port 500 traffic, except to/from IP 9.0.0.2.
Endpoint B is running StrongSwan 5.0.2. This has several public IPs.
eth0 9.0.0.1 - holds the default route
eth0:1 9.0.0.2
eth0:2 9.0.0.3
eth0:3 9.0.0.4
The tunnel is set up between 9.0.0.2 and 8.0.0.1.
When a main mode negotiation is initiated from endpoint A, everything is fine. But whenever I try to initiate a main mode negotiation from endpoint B, the following happens:
Log files on endpoint B say that it is sending a packet from 9.0.0.2500 to 8.0.0.1500. But no such packet is sent. Instead a packet is sent from 9.0.0.1500 to 8.0.0.1500. This packet is dropped in the firewall on endpoint A, so endpoint A never sees the main mode request from endpoint B, and thus never replies to it.
Thinking that the problem was with the default route on endpoint B, I tried adding an explicit route to endpoint A on endpoint B: "ip route add 8.0.0.1/32 via <upstream-router> dev eth0 src 9.0.0.2". This made no difference, and the main mode request still has source IP 9.0.0.1.
Applying the patch above made no difference for me either. The source IP of the main mode request is still wrong.
#4 Updated by Martin Willi over 12 years ago
Log files on endpoint B say that it is sending a packet from 9.0.0.2500 to 8.0.0.1500. But no such packet is sent. Instead a packet is sent from 9.0.0.1500 to 8.0.0.1500. This packet is dropped in the firewall on endpoint A, so endpoint A never sees the main mode request from endpoint B, and thus never replies to it.
If the log states that it is sending from the correct source address, charon tries to enforce it using IP_PKTINFO or IP_SENDSRCADDR (what of course requires support for one or the other on your system).
Given that the concept of interface aliases has been deprecated on Linux for a few years now, have you tried to install multiple addresses on the real interface (using iproute2's "ip address add")?
Regards
Martin
#5 Updated by Ole Husgaard over 12 years ago
Thanks for your quick reply.
My system is CentOS 6.4 (kernel 2.6.32-358.2.1.el6.x86_64) with the latest patches. StrongSwan 5.0.2 is compiled from source on this system, and the source address is set with IP_PKTINFO.
Turns out that I was wrong about the problem being constant. Sometimes it works correctly for some time, and then a bit later - with no change at all, at the next key renegotiation - it sends with source IP 9.0.0.1 again.
I dropped the eth0:1 interface, and now have both 9.0.0.1 and 9.0.0.2 configured on eth0. But that change made no difference.
Looking at the source, it looks like the packet is sent to the kernel in src/libcharon/plugins/socket_default/socket_default_socket.c. I added some extra debugging here to better see what was happening. This confirms to me that the source IP is set using IP_PKTINFO. It also tells me that the IP in pktinfo->ipi_spec_dst is 9.0.0.2 just before the sendmsg() call, even when the packet coming out has source 9.0.0.1. So this is probably some kernel issue.
Any ideas on how to proceed with this would be much appreciated.
#6 Updated by Andreas Steffen over 12 years ago
- Assignee set to Martin Willi
#7 Updated by Ole Husgaard over 11 years ago
Sorry for not replying to #5 above before. The issue there was not because of Strongswan.
(The problem was some special SNAT code we have that sometime would SNAT packets from 9.0.0.2 so they came out from the host with source 9.0.0.1, even though Strongswan sent them with source 9.0.0.2.)
#8 Updated by Martin Willi about 11 years ago
- Tracker changed from Bug to Issue
- Status changed from Feedback to Closed
- Resolution set to No change required
I assume this issue has been resolved, closing the ticket. Feel free to reopen.