Bug #853
Issue with rekeying and updown script
Description
Hello.
We are having a big issue regarding IPsec connection stability when it comes to rekey collisions and the updown script. We are using the firewall distribution IPFire, which uses Strongswan 5.2.1 for IPsec connectivity.
The problem is, when a rekey collision occurs, the updown script gets executed with the client-down
function, which removes iptable firewall rules. This in turn prevents any further data from going through the tunnel. IPsec connections will remain. Routes also remain fine. Only manually restarting the tunnel helps in this situation. Setting different rekey times on each side to prevent a collision did not help, because that gave us other problems.
These are the connection settings from ipsec.conf (both sides use equal settings):
version 2 conn %default keyingtries=%forever include /etc/ipsec.user.conf (which is empty) conn VpnEssenFrankfurt left=xxx.xxx.xxx.178 leftsubnet=192.168.241.0/24 leftfirewall=yes lefthostaccess=yes right=xxx.xxx.xxx.254 rightsubnet=10.2.34.0/24 leftid="xxx.xxx.xxx.178" rightid="xxx.xxx.xxx.254" ike=aes128-sha-modp2048,aes128-sha-modp1536,aes128-sha-modp1024 esp=aes128-sha1-modp2048,aes128-sha1-modp1536,aes128-sha1-modp1024 keyexchange=ikev2 ikelifetime=8h keylife=1h compress=yes dpdaction=restart dpddelay=120 dpdtimeout=30 authby=secret auto=start
rekeymargin, rekeyfuzz are not set, so they should be at default values.
I also created a test setup using VMs with the same software. To reproduce rekey collisions and investigate a little further I changed the following settings:
keylife=1m rekeymargin=50s rekeyfuzz=0%
Before rekey collision:
Chain IPSECFORWARD (1 references) target prot opt source destination RETURN all -- 10.1.34.0/24 10.2.34.0/24 policy match dir in pol ipsec reqid 1 proto 50 MARK all -- 10.2.34.0/24 10.1.34.0/24 policy match dir out pol ipsec reqid 1 proto 50 MARK set 0x32 Chain IPSECINPUT (1 references) target prot opt source destination ACCEPT esp -- 10.1.100.178 10.1.100.254 ACCEPT ah -- 10.1.100.178 10.1.100.254 ACCEPT 4 -- 10.1.100.178 10.1.100.254 ACCEPT 4 -- 10.1.100.178 10.1.100.254 policy match dir in pol ipsec reqid 1 proto 50 RETURN all -- 10.1.34.0/24 10.2.34.0/24 policy match dir in pol ipsec reqid 1 proto 50 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:4500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:4500 Chain IPSECOUTPUT (1 references) target prot opt source destination MARK all -- 10.2.34.0/24 10.1.34.0/24 policy match dir out pol ipsec reqid 1 proto 50 MARK set 0x32
After rekey collision:
Chain IPSECFORWARD (1 references) target prot opt source destination Chain IPSECINPUT (1 references) target prot opt source destination ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:4500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:500 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:4500 Chain IPSECOUTPUT (1 references) target prot opt source destination
Here are some log entries when it happens:
09:50:49 charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 09:50:49 charon: 15[ENC] generating CREATE_CHILD_SA response 19 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 09:50:49 charon: 15[NET] sending packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (476 bytes) 09:50:49 charon: 11[NET] received packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (476 bytes) 09:50:49 charon: 11[ENC] parsed CREATE_CHILD_SA response 5 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 09:50:49 charon: 11[IKE] received IPCOMP_SUPPORTED notify 09:50:49 charon: 11[CHD] using AES_CBC for encryption 09:50:49 charon: 11[CHD] using HMAC_SHA1_96 for integrity 09:50:49 charon: 11[CHD] adding inbound ESP SA 09:50:49 charon: 11[CHD] SPI 0xce1d4d5e, src 10.1.100.178 dst 10.1.100.254 09:50:49 charon: 11[KNL] adding SAD entry with SPI 0000cb33 and reqid {1} (mark 0/0x00000000) 09:50:49 charon: 11[KNL] using compression algorithm IPCOMP_DEFLATE 09:50:49 charon: 11[KNL] adding SAD entry with SPI ce1d4d5e and reqid {1} (mark 0/0x00000000) 09:50:49 charon: 11[KNL] using encryption algorithm AES_CBC with key size 128 09:50:49 charon: 11[KNL] using integrity algorithm HMAC_SHA1_96 with key size 160 09:50:49 charon: 11[KNL] using replay window of 32 packets 09:50:49 charon: 11[CHD] adding outbound ESP SA 09:50:49 charon: 11[CHD] SPI 0xca7c0116, src 10.1.100.254 dst 10.1.100.178 09:50:49 charon: 11[KNL] adding SAD entry with SPI 000015f8 and reqid {1} (mark 0/0x00000000) 09:50:49 charon: 11[KNL] using compression algorithm IPCOMP_DEFLATE 09:50:49 charon: 11[KNL] adding SAD entry with SPI ca7c0116 and reqid {1} (mark 0/0x00000000) 09:50:49 charon: 11[KNL] using encryption algorithm AES_CBC with key size 128 09:50:49 charon: 11[KNL] using integrity algorithm HMAC_SHA1_96 with key size 160 09:50:49 charon: 11[KNL] using replay window of 32 packets 09:50:49 charon: 11[KNL] policy 10.2.34.0/24 === 10.1.34.0/24 out (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 in (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 fwd (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] policy 10.2.34.0/24 === 10.1.34.0/24 out (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] updating policy 10.2.34.0/24 === 10.1.34.0/24 out (mark 0/0x00000000) 09:50:49 charon: 11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 in (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] updating policy 10.1.34.0/24 === 10.2.34.0/24 in (mark 0/0x00000000) 09:50:49 charon: 11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 fwd (mark 0/0x00000000) already exists, increasing refcount 09:50:49 charon: 11[KNL] updating policy 10.1.34.0/24 === 10.2.34.0/24 fwd (mark 0/0x00000000) 09:50:49 charon: 11[KNL] getting a local address in traffic selector 10.2.34.0/24 09:50:49 charon: 11[KNL] using host 10.2.34.1 09:50:49 charon: 11[KNL] using 10.1.100.178 as nexthop to reach 10.1.100.178/32 09:50:49 charon: 11[KNL] 10.1.100.254 is on interface red0 09:50:49 charon: 11[IKE] CHILD_SA VpnFrankfurtEssenTest{1} established with SPIs ce1d4d5e_i ca7c0116_o and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 11[IKE] CHILD_SA VpnFrankfurtEssenTest{1} established with SPIs ce1d4d5e_i ca7c0116_o and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 11[IKE] CHILD_SA rekey collision won, deleting old child 09:50:49 charon: 11[IKE] reinitiating already active tasks 09:50:49 charon: 11[IKE] CHILD_REKEY task 09:50:49 charon: 11[KNL] querying SAD entry with SPI c78986b4 (mark 0/0x00000000) 09:50:49 charon: 11[KNL] querying SAD entry with SPI c5afd499 (mark 0/0x00000000) 09:50:49 charon: 11[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c78986b4_i (0 bytes) c5afd499_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 11[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c78986b4_i (0 bytes) c5afd499_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 11[IKE] sending DELETE for ESP CHILD_SA with SPI c78986b4 09:50:49 charon: 11[ENC] generating INFORMATIONAL request 6 [ D ] 09:50:49 charon: 11[NET] sending packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 09:50:49 charon: 13[NET] received packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 09:50:49 charon: 13[ENC] parsed INFORMATIONAL request 20 [ D ] 09:50:49 charon: 13[IKE] received DELETE for ESP CHILD_SA with SPI c4a8cd3a 09:50:49 charon: 13[KNL] querying SAD entry with SPI c950c19c (mark 0/0x00000000) 09:50:49 charon: 13[KNL] querying SAD entry with SPI c4a8cd3a (mark 0/0x00000000) 09:50:49 charon: 13[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c950c19c_i (0 bytes) c4a8cd3a_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 13[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c950c19c_i (0 bytes) c4a8cd3a_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24 09:50:49 charon: 13[IKE] sending DELETE for ESP CHILD_SA with SPI c950c19c 09:50:49 charon: 13[IKE] CHILD_SA closed 09:50:49 charon: 13[CHD] running updown script: 2>&1 PLUTO_VERSION='1.1' PLUTO_VERB='down-client' PLUTO_CONNECTION='VpnFrankfurtEssenTest' PLUTO_INTERFACE='red0' PLUTO_REQID='1' PLUTO_PROTO='esp' PLUTO_UNIQUEID='1' PLUTO_ME='10.1.100.254' PLUTO_MY_ID='10.1.100.254' PLUTO_MY_CLIENT='10.2.34.0/24' PLUTO_MY_PORT='0' PLUTO_MY_PROTOCOL='0' PLUTO_PEER='10.1.100.178' PLUTO_PEER_ID='10.1.100.178' PLUTO_PEER_CLIENT='10.1.34.0/24' PLUTO_PEER_PORT='0' PLUTO_PEER_PROTOCOL='0' PLUTO_IPCOMP='1' /usr/libexec/ipsec/_updown
Other host:
09:50:47 charon: 10[IKE] detected CHILD_REKEY collision with CHILD_REKEY 09:50:47 charon: 10[ENC] generating CREATE_CHILD_SA response 5 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 09:50:47 charon: 10[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (476 bytes) 09:50:47 charon: 13[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (476 bytes) 09:50:47 charon: 13[ENC] parsed CREATE_CHILD_SA response 19 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 09:50:48 charon: 13[IKE] CHILD_SA VpnEssenFrankfurtTest{5} established with SPIs c4a8cd3a_i c950c19c_o and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 13[IKE] CHILD_SA VpnEssenFrankfurtTest{5} established with SPIs c4a8cd3a_i c950c19c_o and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 13[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 09:50:48 charon: 13[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c4a8cd3a_i (0 bytes) c950c19c_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 13[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c4a8cd3a_i (0 bytes) c950c19c_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 13[IKE] sending DELETE for ESP CHILD_SA with SPI c4a8cd3a 09:50:48 charon: 13[ENC] generating INFORMATIONAL request 20 [ D ] 09:50:48 charon: 13[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 09:50:48 charon: 03[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 09:50:48 charon: 03[ENC] parsed INFORMATIONAL request 6 [ D ] 09:50:48 charon: 03[IKE] received DELETE for ESP CHILD_SA with SPI c78986b4 09:50:48 charon: 03[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c5afd499_i (0 bytes) c78986b4_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 03[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c5afd499_i (0 bytes) c78986b4_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24 09:50:48 charon: 03[IKE] sending DELETE for ESP CHILD_SA with SPI c5afd499 09:50:48 charon: 03[IKE] CHILD_SA closed 09:50:48 charon: 03[IKE] detected CHILD_REKEY collision with CHILD_DELETE 09:50:48 charon: 03[ENC] generating INFORMATIONAL response 6 [ D ] 09:50:48 charon: 03[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 09:50:48 charon: 05[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 09:50:48 charon: 05[ENC] parsed INFORMATIONAL response 20 [ D ] 09:50:48 charon: 05[IKE] received DELETE for ESP CHILD_SA with SPI c950c19c 09:50:48 charon: 05[IKE] CHILD_SA closed
Although I am not a programmer, I dug around in the source code and might have found a relevant place. Maybe it's not working right here.
File: child_delete.c
static status_t destroy_and_reestablish(private_child_delete_t *this) [...] /* signal child down event if we are not rekeying */ if (!this->rekeyed) { charon->bus->child_updown(charon->bus, child_sa, FALSE); } [...]
If you need any more information, I will be glad to provide. Thank you!
Related issues
Associated revisions
History
#1 Updated by Martin Willi over 5 years ago
- Has duplicate Issue #959: Invalid call to ipsec _updown after session rekey when there is a rekey collision added
#2 Updated by Tom Rymes about 5 years ago
Are there any updates on this issue? We were bitten by this yesterday and I am looking to get confirmation that this is/is-not a problem with Strongswan. This has been discussed a little on the IPFire forum as well: http://forum.ipfire.org/viewtopic.php?f=27&t=12628
#3 Updated by Tobias Brunner about 5 years ago
- Tracker changed from Issue to Bug
- Status changed from New to Feedback
Please try the patch in the 853-updown-rekey-collision branch. It tries to suppress the updown event during rekeying in case an installed redundant CHILD_SA is deleted.
#4 Updated by Tom Rymes about 5 years ago
With help from The IPFire development team, I have been able to test the patch and it does appear to have fixed the issue, as I have had a tunnel up for nearly 24-hours and hundreds of collisions that is still working. Thank you!
#5 Updated by Andreas Braun about 5 years ago
Tom Rymes wrote:
With help from The IPFire development team, I have been able to test the patch and it does appear to have fixed the issue, as I have had a tunnel up for nearly 24-hours and hundreds of collisions that is still working. Thank you!
Hey Tom, good to see some progress on this topic. Can you tell me how to test the patch with IPFire?
Thank you.
#6 Updated by Tom Rymes about 5 years ago
Andreas: I opened Bug 10998 on the IPFire Bugzilla here: https://bugzilla.ipfire.org/show_bug.cgi?id=10998
Michael posted a patched version of Strongswan there, along with instructions to install.
#7 Updated by Tobias Brunner about 5 years ago
- Category set to libcharon
- Target version set to 5.4.0
With help from The IPFire development team, I have been able to test the patch and it does appear to have fixed the issue, as I have had a tunnel up for nearly 24-hours and hundreds of collisions that is still working. Thank you!
Great, thanks for testing. I'll line this up for the next release.
#8 Updated by Tobias Brunner almost 5 years ago
- Status changed from Feedback to Closed
- Assignee set to Tobias Brunner
- Resolution set to Fixed
child-rekey: Suppress updown event when deleting redundant CHILD_SAs
When handling a rekey collision we might have to delete an already
installed redundant CHILD_SA (or expect the other peer to do so).
We don't want to trigger updown events for these as neither do we do
so for successfully rekeyed CHILD_SAs.
Fixes #853.