Issue #2943: No child SA negotiated when pool is temporarily full (now with swanctl.conf) - strongSwan

Issue #2943

No child SA negotiated when pool is temporarily full (now with swanctl.conf)

Added by Robert Dahlem over 6 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Tobias Brunner

Category:

configuration

Affected version:

5.5.1

Resolution:

No change required

Description

This is a revenant of #2931. Now I'm on swanctl.conf instead of ipsec.conf.

I have a server running Debian 9 and strongSwan 5.5.1. My Client is a Raspberry Pi running Raspbian 9 and strongSwan 5.5.1.

The server has an official IP address (94.x.y.z). The Raspi is on a private network where it uses 192.168.1.11. This private network is connected to the internet through a NAT dial-up router that gets disconnected for some minutes every night and then gets a new IP address. Don't ask, it is like it is. I have edited the dial-up routers IP address to 93.x.y.1 before the disconnection and to 93.x.y.2 after the reconnect.

The client gets a fixed private address 172.29.254.1 from the server, so the "pool size" is 1.

About once a week the client has problems to reconnect after the disconnection. In the server log I see:

parsed IKE_AUTH request 4 [ AUTH ]
authentication of 'raspi' with EAP successful
authentication of 'vpn.mysystems.tld' (myself) with EAP
deleting duplicate IKE_SA for peer 'raspi' due to uniqueness policy
IKE_SA vpn_mysystems_tld-ikev2-raspi[65] established between 94.x.y.z[vpn.mysystems.tld]...93.x.y.2[raspi]
scheduling rekeying in 12964s
maximum IKE_SA lifetime 14404s
peer requested virtual IP 172.29.254.1
pool 'pool-ikev2-raspi' is full, unable to assign address
no virtual IP found for 172.29.254.1 requested by 'raspi'
no virtual IP found, sending INTERNAL_ADDRESS_FAILURE
looking for a child config for 94.x.y.z/32 === 0.0.0.0/0
proposing traffic selectors for us:
 94.x.y.z/32
proposing traffic selectors for other:
 dynamic
  candidate "vpn_mysystems_tld-ikev2-raspi" with prio 5+5
found matching child config "vpn_mysystems_tld-ikev2-raspi" with prio 10
configuration payload negotiation failed, no CHILD_SA built
closing IKE_SA due CHILD_SA setup failure
generating IKE_AUTH response 4 [ AUTH N(MOBIKE_SUP) N(ADD_4_ADDR) N(ADD_4_ADDR) N(ADD_6_ADDR) N(INT_ADDR_FAIL) ]

and only after a while:

lease 172.29.254.1 by 'raspi' went offline

while the client logs:

received INTERNAL_ADDRESS_FAILURE notify, no CHILD_SA built
closing IKE_SA due CHILD_SA setup failure

So basically, the server did not free the lease from the "pool" yet, refuses to assign another virtual IP because the "pool" is full and no Child SA is established.

There are two things standing out to me:

the server logs "deleting duplicate IKE_SA for peer 'raspi' due to uniqueness policy" but only frees the lease after the former SA gets deleted by DPD. Shouldn't this be done at once?

the client seems to do nothing after the first attempt failed (this was different with ipsec.conf). Shouldn't the client be a bit more persistent with keyingtries=0?

client.strongswan.conf (428 Bytes) client.strongswan.conf		Robert Dahlem, 27.02.2019 09:53
client.swanctl.conf (412 Bytes) client.swanctl.conf		Robert Dahlem, 27.02.2019 09:53
client.charon.log (46.5 KB) client.charon.log		Robert Dahlem, 27.02.2019 09:53
server.charon.log (15.5 KB) server.charon.log		Robert Dahlem, 27.02.2019 09:53
server.strongswan.conf (428 Bytes) server.strongswan.conf		Robert Dahlem, 27.02.2019 09:53
server.swanctl.conf (802 Bytes) server.swanctl.conf		Robert Dahlem, 27.02.2019 09:53
new.client.log (4.32 KB) new.client.log		Robert Dahlem, 28.02.2019 14:08
new.client.ip.log (2.54 KB) new.client.ip.log		Robert Dahlem, 28.02.2019 14:08
new.client.swanctl.conf (319 Bytes) new.client.swanctl.conf		Robert Dahlem, 28.02.2019 14:08
new.server.log (2.37 KB) new.server.log		Robert Dahlem, 28.02.2019 14:08
new.server.swanctl.conf (332 Bytes) new.server.swanctl.conf		Robert Dahlem, 28.02.2019 14:08

History

#1 Updated by Tobias Brunner over 6 years ago

Status changed from New to Feedback

the server logs "deleting duplicate IKE_SA for peer 'raspi' due to uniqueness policy" but only frees the lease after the former SA gets deleted by DPD. Shouldn't this be done at once?

The daemon uses a regular delete to terminate the existing SA, unless the client sends an INITIAL_CONTACT notify. So if the peer does not respond, this results in regular retransmissions (and eventually a timeout killing the SA). And if there already is an active task (e.g. a DPD) the delete task is just queued.

To send an INITIAL_CONTACT notify, configure unique = replace on the client too. This is different to ipsec.conf, where uniqueids defaulted to yes, but also changed with 5.6.1 (now it's also sent with the default of no), see the documentation of the unique setting in the man page or here).

the client seems to do nothing after the first attempt failed (this was different with ipsec.conf). Shouldn't the client be a bit more persistent with keyingtries=0?

No, it won't retry after fatal errors. Use trap policies for automatic (re-)creation of the connection (start_action=trap without dpd_action and close_action).

#2 Updated by Robert Dahlem over 6 years ago

Ok, I will try that and report on it in the next days. Thank you.

#3 Updated by Robert Dahlem over 6 years ago

File new.client.ip.log new.client.ip.log added
File new.client.log new.client.log added
File new.client.swanctl.conf new.client.swanctl.conf added
File new.server.log new.server.log added
File new.server.swanctl.conf new.server.swanctl.conf added

It's different now but not ok. The server deletes the SAs because of DPD and waits. The clients seems to be confused somehow and gives up. Pinging the remote_ts (94.x.y.z) from the client does not trigger the tunnel.

I'm not sure how the IPsec policies should look. They look different before and after manually initiatiating the connection (see new.client.ip.log).

#4 Updated by Tobias Brunner over 6 years ago

One likely problem is that trap policies combined with virtual IPs are only supported since 5.6.3 (i.e. this is not supported in 5.5.1 on the client, see #2162).

They look different before and after manually initiatiating the connection (see new.client.ip.log).

That's normal because before you receive a virtual IP from the server your physical IP is used in the trap policies. The actual IPsec policies will then use the virtual IP (the trap policies will stay installed, but via policy routing the virtual IP is forced as source address to match the regular policies - but that's exactly what didn't work in earlier versions).

If you don't want to update you can assign a static tunnel IP to the client (i.e. don't use vips and pools, just assign an address locally to an interface and then use it in the traffic selector(s)).

#5 Updated by Robert Dahlem over 6 years ago

Just an update: one does not simply add a secondary IP address on Raspbian ... they use dhcpcd to assign static addresses (which is a DHCP client) and it does not seem to be capable of assigning secondary addresses.

So I had to fiddle with /etc/network/interfaces.d/eth0-0, but that starts up after charon-systemd, which leads to "unable to install source route for $MY_LOCAL_TS".

To remedy that I have added an ExecStartPre to the systemd service unit which waits for the IP addresses to appear.

I will report about the stability of this later.

#6 Updated by Robert Dahlem over 6 years ago

On the initiator end I changed remote_addrs from a DNS name to an IP address because the connection tended to fail fatally at boot time as there was no DNS resolution yet. Also I changed start_action to trap and added local_ts/remote_ts.

On the responder end I delete pools and added local_ts/remote_ts.

Since then I did not have any more dead connections in the morning.

Thank you! You can close this issue.

#7 Updated by Tobias Brunner over 6 years ago

Category set to configuration
Status changed from Feedback to Closed
Assignee set to Tobias Brunner
Resolution set to No change required

On the initiator end I changed remote_addrs from a DNS name to an IP address because the connection tended to fail fatally at boot time as there was no DNS resolution yet.

The charon.retry_initiate_interval might have helped in this case.

Project

General

Profile

strongSwan

Issues