Project

General

Profile

Issue #3201

stale SA cleanup when peer has failed

Added by Maha Vasu about 1 month ago. Updated 28 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Affected version:
5.7.2
Resolution:

Description

We’re attempting to tear down a known stale SA, where we purposely don’t want to notify the peer (peer has failed). The following behavior has been observed where the first attempt fails, as it attempts to send a notification, and the second attempt succeeds, with the behavior that we were looking for (no notification):

First try:
----------
swanctl --terminate --ike-id 104 --timeout -1
terminate failed: terminating SA failed

deleting IKE_SA host-host-v6104 : ……..
sending DELETE for IKE_SA host-host-v6104
10[ENC] generating INFORMATIONAL request 2 [ D ]
10[NET] sending packet: from IPV6addr1500 to IPV6addr2500 (65 bytes)
IKE] retransmit 1 of request with message ID 2

And so on, as remote node is gone.

Second Try (soon after):
---------------------------------
swanctl --terminate --ike-id 104 --timeout -1
terminate completed successfully

07[CFG] vici terminate IKE_SA #104
07[IKE] destroying IKE_SA in state DELETING without notification.

It looks like this should be supported as of 5.6.3

“New options for vici/swanctl allow forcing the local termination of an IKE_SA. This might be
useful in situations where it's known the other end is not reachable anymore, or that it already
removed the IKE_SA, so retransmitting a DELETE and waiting for a response would be pointless.
Waiting only a certain amount of time for a response (i.e. shorter than all retransmits would be)
before destroying the IKE_SA is also possible by additionally specifying a timeout in the forced
termination request.”

“--force” seems to do the trick; but is that the right option in this case. Thank you!

History

#1 Updated by Noel Kuntze about 1 month ago

  • Status changed from New to Feedback

but is that the right option in this case?

(fixed that for you)

Yes.

Is there a problem though with the peer failure not being detected?

#2 Updated by Maha Vasu about 1 month ago

Thanks. We suspect it could be timing issue on the peer failure detection, given that it is able to execute fine on the immediate second attempt. Will the following approach help in our case?

connections.<conn>.dpd_delay 0s
Interval to check the liveness of a peer actively using IKEv2 INFORMATIONAL exchanges or IKEv1 R_U_THERE messages. Active DPD checking is only enforced if no IKE or ESP/AH packet has been received for the configured DPD delay.
And here’s the info on the Dead Peer Detection:

https://www.strongswan.org/docs/readme4.htm#section_14.3

#3 Updated by Maha Vasu 29 days ago

We’ve looked further into the use of DPD with IKEV2. Apparently, the retransmission timeouts are leveraged in order to declare a peer dead. In our case, the peers are connected via a reliable internal network (might cross switch boundaries). Ideally, we’d declare a peer dead within 30 seconds or so. Wondering if you have a recommendation as far as the retransmit_tries and retransmit_timeout values for a case like this? Thanks

#4 Updated by Tobias Brunner 28 days ago

Also available in: Atom PDF