Issue #310: Problem with source IP selection in multihomed environments - strongSwan

Issue #310

Problem with source IP selection in multihomed environments

Added by Davidok ok over 12 years ago. Updated about 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Martin Willi

Category:

Affected version:

5.0.2

Resolution:

No change required

Description

I currently have a IKEv1 transport connection setup on my server. Everything appears to work correctly, but seems to have a problem where the connection stalls when the server is trying to rekey.

The server has one public interface with multiple public IPs setup with aliases, like this:
eth0 9.0.0.1 - holds the default route
eth0:1 9.0.0.2
eth0:2 9.0.0.3
eth0:3 9.0.0.4

Connection entries was setup for 9.0.0.2 and 9.0.0.3, and it works fine.
When a client establishes a connection on 9.0.0.2, the source IP address for the initial ISAKMP packets and ESP packets were correct (9.0.0.2). However, when the server tries to rekey with the client, the ISAKMP packets sent are no longer with the right source IP address (9.0.0.1 instead of 9.0.0.2), even though the log itself said it generated a packet with the source of 9.0.0.2 (not the case as seen from tcpdump).

The log contains this when it happened:

Mar 11 20:45:46 03[ENC] generating QUICK_MODE request 334791759 [ HASH SA No ID ID ]
Mar 11 20:45:46 03[NET] sending packet: from 9.0.0.2⁵⁰⁰ to 8.0.0.100⁵⁰⁰ (220 bytes)
Mar 11 20:45:46 11[NET] received packet: from 8.0.0.100⁵⁰⁰ to 9.0.0.2⁵⁰⁰ (76 bytes)
Mar 11 20:45:46 11[ENC] parsed INFORMATIONAL_V1 request 152048315 [ HASH N(INVAL_ID) ]
Mar 11 20:45:46 11[IKE] received INVALID_ID_INFORMATION error notify
Mar 11 20:45:48 07[NET] received packet: from 8.0.0.100⁵⁰⁰ to 9.0.0.2⁵⁰⁰ (220 bytes)
Mar 11 20:45:48 07[ENC] parsed QUICK_MODE request 3295316553 [ HASH SA No ID ID ]
Mar 11 20:45:48 07[IKE] no matching CHILD_SA config found
Mar 11 20:45:48 07[ENC] generating INFORMATIONAL_V1 request 2030090789 [ HASH N(INVAL_ID) ]

and the connection stalls during the rekeying process since it can never rekey properly. If the client managed to negotiate NAT-T during the initial negotiation, then rekeying isn't an issue since the ISAKMP packets were then encapsulated in the nat-t tunnel containing the correct source IP address.

Any way this can be resolved?

Related issues

History

#1 Updated by Martin Willi over 12 years ago

Status changed from New to Feedback

Hi David,

03[ENC] generating QUICK_MODE request 334791759 [ HASH SA No ID ID ]
03[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (220 bytes)
11[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (76 bytes)
11[ENC] parsed INFORMATIONAL_V1 request 152048315 [ HASH N(INVAL_ID) ]
11[IKE] received INVALID_ID_INFORMATION error notify

Looks like the selection of the ID payloads is wrong when multiple paths to the peer are available.

I'm not sure if it caused this specific issue, but we have been a little to aggressive in updating peer addresses when using IKEv1 or non-MOBIKE IKEv2 connections. I've recently pushed a fix that might help with that problem as well. Please try the patch at:

http://git.strongswan.org/?p=strongswan.git;a=commitdiff;h=21dd4c4b

Regards
Martin

#2 Updated by Davidok ok over 12 years ago

Hey Martin.
Thanks for the suggestion.
I have tried the patch, but to no avail. One thing I noticed that it is quite difficult sometimes to actually get it to do ESP instead of NAT-T. Restarting the daemons a couple times may get it to do ESP, and the problem only occurs in straight ESP vs it being inside a NAT-T tunnel.
Perhaps you can see something interesting here?

# ipsec.conf - strongSwan IPsec configuration file

config setup

conn %default
        ikelifetime=60m
        keylife=30m
        rekeymargin=12m
        keyexchange=ikev1
        fragmentation=yes
        authby=secret

conn rw
        left=9.0.0.2
        leftprotoport=tcp/445
        type=transport
        right=%any
        auto=add

Connections:
          rw:  9.0.0.2...%any  IKEv1
          rw:   local:  [9.0.0.2] uses pre-shared key authentication
          rw:   remote: uses pre-shared key authentication
          rw:   child:  dynamic[tcp/445] === dynamic TRANSPORT
Security Associations (1 up, 0 connecting):
          rw[1]: ESTABLISHED 8 minutes ago, 9.0.0.2[9.0.0.2]...8.0.0.100[8.0.0.100]
          rw[1]: IKEv1 SPIs: 596cbbc85c858e62_i 8711a2471bc1a140_r*, pre-shared key reauthentication in 32 minutes
          rw[1]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
          rw{1}:  REKEYING, TRANSPORT, expires in 21 minutes
          rw{1}:   9.0.0.2/32[tcp/445] === 8.0.0.100/32

11:40:38.859414 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa5), length 100
11:40:39.888854 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xad), length 100
11:40:44.853424 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa6), length 100
11:40:44.853720 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xae), length 100
11:40:49.073132 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xaf), length 100
11:40:50.860017 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa7), length 100
11:40:54.078826 IP 8.0.0.100 > 9.0.0.2: ESP(spi=0xce91b379,seq=0xb0), length 100
11:40:56.022805 IP 9.0.0.1.500 > 8.0.0.100.500: isakmp: phase 2/others ? oakley-quick[E]
11:40:56.024114 IP 8.0.0.100.500 > 9.0.0.1.500: isakmp: phase 2/others I inf[E]
11:40:56.852051 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa8), length 100
11:40:59.916228 IP 8.0.0.100.500 > 9.0.0.1.500: isakmp: phase 2/others I oakley-quick[E]
11:41:00.052425 IP 9.0.0.1.500 > 8.0.0.100.500: isakmp: phase 2/others ? inf[E]
11:41:02.855440 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xa9), length 100
11:41:08.853759 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xaa), length 100
11:41:13.934110 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xab), length 100
11:41:18.919456 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xac), length 100
11:41:24.884587 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xad), length 100
11:41:29.850175 IP 9.0.0.2 > 8.0.0.100: ESP(spi=0xcb4ce47a,seq=0xae), length 100

Mar 12 11:41:01 04[KNL] creating rekey job for ESP CHILD_SA with SPI cb4ce47a and reqid {1}
Mar 12 11:41:01 09[ENC] generating QUICK_MODE request 257055226 [ HASH SA No ID ID ]
Mar 12 11:41:01 09[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (220 bytes)
Mar 12 11:41:01 12[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (76 bytes)
Mar 12 11:41:01 12[ENC] parsed INFORMATIONAL_V1 request 1151708611 [ HASH N(INVAL_ID) ]
Mar 12 11:41:01 12[IKE] received INVALID_ID_INFORMATION error notify
Mar 12 11:41:05 07[NET] received packet: from 8.0.0.100[500] to 9.0.0.2[500] (220 bytes)
Mar 12 11:41:05 07[ENC] parsed QUICK_MODE request 2568701466 [ HASH SA No ID ID ]
Mar 12 11:41:05 07[IKE] no matching CHILD_SA config found
Mar 12 11:41:05 07[ENC] generating INFORMATIONAL_V1 request 806514467 [ HASH N(INVAL_ID) ]
Mar 12 11:41:05 07[NET] sending packet: from 9.0.0.2[500] to 8.0.0.100[500] (76 bytes)

#3 Updated by Ole Husgaard over 12 years ago

I have exactly the same problem, in a very simple configuration. The problem seems constant, and easy to reproduce.

I try to connect two LANs with a tunnel between VPN endpoints A and B, using IKEv1.

Endpoint A is an old StrongSwan installation, using Pluto. This has public IP 8.0.0.1. This endpoint has a firewall that blocks all UDP port 500 traffic, except to/from IP 9.0.0.2.

Endpoint B is running StrongSwan 5.0.2. This has several public IPs.
eth0 9.0.0.1 - holds the default route
eth0:1 9.0.0.2
eth0:2 9.0.0.3
eth0:3 9.0.0.4
The tunnel is set up between 9.0.0.2 and 8.0.0.1.

When a main mode negotiation is initiated from endpoint A, everything is fine. But whenever I try to initiate a main mode negotiation from endpoint B, the following happens:

Log files on endpoint B say that it is sending a packet from 9.0.0.2⁵⁰⁰ to 8.0.0.1⁵⁰⁰. But no such packet is sent. Instead a packet is sent from 9.0.0.1⁵⁰⁰ to 8.0.0.1⁵⁰⁰. This packet is dropped in the firewall on endpoint A, so endpoint A never sees the main mode request from endpoint B, and thus never replies to it.

Thinking that the problem was with the default route on endpoint B, I tried adding an explicit route to endpoint A on endpoint B: "ip route add 8.0.0.1/32 via <upstream-router> dev eth0 src 9.0.0.2". This made no difference, and the main mode request still has source IP 9.0.0.1.

Applying the patch above made no difference for me either. The source IP of the main mode request is still wrong.

#4 Updated by Martin Willi over 12 years ago

Log files on endpoint B say that it is sending a packet from 9.0.0.2⁵⁰⁰ to 8.0.0.1⁵⁰⁰. But no such packet is sent. Instead a packet is sent from 9.0.0.1⁵⁰⁰ to 8.0.0.1⁵⁰⁰. This packet is dropped in the firewall on endpoint A, so endpoint A never sees the main mode request from endpoint B, and thus never replies to it.

If the log states that it is sending from the correct source address, charon tries to enforce it using IP_PKTINFO or IP_SENDSRCADDR (what of course requires support for one or the other on your system).

Given that the concept of interface aliases has been deprecated on Linux for a few years now, have you tried to install multiple addresses on the real interface (using iproute2's "ip address add")?

Regards
Martin

#5 Updated by Ole Husgaard over 12 years ago

Thanks for your quick reply.

My system is CentOS 6.4 (kernel 2.6.32-358.2.1.el6.x86_64) with the latest patches. StrongSwan 5.0.2 is compiled from source on this system, and the source address is set with IP_PKTINFO.

Turns out that I was wrong about the problem being constant. Sometimes it works correctly for some time, and then a bit later - with no change at all, at the next key renegotiation - it sends with source IP 9.0.0.1 again.

I dropped the eth0:1 interface, and now have both 9.0.0.1 and 9.0.0.2 configured on eth0. But that change made no difference.

Looking at the source, it looks like the packet is sent to the kernel in src/libcharon/plugins/socket_default/socket_default_socket.c. I added some extra debugging here to better see what was happening. This confirms to me that the source IP is set using IP_PKTINFO. It also tells me that the IP in pktinfo->ipi_spec_dst is 9.0.0.2 just before the sendmsg() call, even when the packet coming out has source 9.0.0.1. So this is probably some kernel issue.

Any ideas on how to proceed with this would be much appreciated.

#6 Updated by Andreas Steffen over 12 years ago

Assignee set to Martin Willi

#7 Updated by Ole Husgaard over 11 years ago

Sorry for not replying to #5 above before. The issue there was not because of Strongswan.

(The problem was some special SNAT code we have that sometime would SNAT packets from 9.0.0.2 so they came out from the host with source 9.0.0.1, even though Strongswan sent them with source 9.0.0.2.)

#8 Updated by Martin Willi about 11 years ago

Tracker changed from Bug to Issue
Status changed from Feedback to Closed
Resolution set to No change required

I assume this issue has been resolved, closing the ticket. Feel free to reopen.

Project

General

Profile

strongSwan

Issues