Project

General

Profile

Issue #1274

multi connections between two IPs with auto=route

Added by Sudheer Anumolu almost 5 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Affected version:
5.3.0
Resolution:
No feedback

Description

Hi

With auto=route , i see multiple connections between two IP addresses, as below.
With auto=add, i never see this behaviour.

When started "ipsec up t30", only one connection is seen. But after few hours, multiple connections are seen.
Can someone tell why old connection is not getting deleted.

[root@moon left]# ipsec statusall | grep ESTABLISHED | grep t30
         t30[1242]: ESTABLISHED 64 seconds ago, 192.168.39.1[moon.strongswan.org]...192.168.39.2[sun.strongswan.org]
         t30[1211]: ESTABLISHED 37 minutes ago, 192.168.39.1[moon.strongswan.org]...192.168.39.2[sun.strongswan.org]
[root@moon left]# 
[root@moon left]# ipsec statusall | grep rekeying| grep t30
         t30{3989}:  AES_CBC_128/HMAC_SHA1_96, 0 bytes_i, 0 bytes_o, rekeying in 23 minutes
         t30{3956}:  AES_CBC_128/HMAC_SHA1_96, 0 bytes_i, 0 bytes_o, rekeying in 14 minutes
[root@moon left]# ipsec statusall | grep reauthentication| grep t30
         t30[1242]: IKEv2 SPIs: 3719f176491faf7e_i 3f7d16d50c923ed7_r*, public key reauthentication in 67 minutes
         t30[1211]: IKEv2 SPIs: d47395dbcd9486cc_i* 2c7ead60d4ed8a81_r, public key reauthentication in 31 minutes
[root@moon left]# 
[root@moon ]# setkey -D | grep 192.168.39.1
192.168.39.1 192.168.39.2 
192.168.39.2 192.168.39.1 
192.168.39.1 192.168.39.2 
192.168.39.2 192.168.39.1 
[root@moon ]# 
[root@moon]# setkey -DP | grep 192.168.39.1
    esp/tunnel/192.168.39.2-192.168.39.1/unique:30
    esp/tunnel/192.168.39.2-192.168.39.1/unique:30
    esp/tunnel/192.168.39.1-192.168.39.2/unique:30
[root@moon ]# 

ipsec.conf

config setup
    uniqueids=never

conn %default
        ikelifetime=75m
        keylife=30m
        rekeymargin=1m
        keyingtries=1
        keyexchange=ikev2
        mobike=no
        leftcert=moonCert.pem
        leftid=@moon.strongswan.org
        rightid=@sun.strongswan.org

conn t30
        left=192.168.39.1
        leftsubnet=10.1.39.1/24
        right=192.168.39.2
        rightsubnet=20.1.39.1/24
        type=tunnel
        auto=route

strongswan.conf

charon {
  load = curl aes des sha1 sha2 md5 pem pkcs1 gmp random nonce x509 revocation hmac xcbc stroke kernel-pfkey kernel-netlink socket-default updown
  make-before-break=yes
}

Thanks
Sudheer

History

#1 Updated by Tobias Brunner over 4 years ago

  • Description updated (diff)
  • Status changed from New to Feedback

Not really related, but why would you use ipsec up with auto=route? See IntroductionTostrongSwan.

It's possible that either both ends create their tunnel concurrently or (if it only happens after a few hours) due to reauthentication. With reauth=yes (the default) and without make-before-break reauthentication (charon.make_before_break=no, the default) there is a short time where no SAs/policies are installed (particularly on the remote end, where the original IKE_SA and all CHILD_SAs are deleted and the new ones created). If traffic matches the trap policies, installed due to auto=route, just then an additional SA could get triggered. Please check the log files to see when and why SAs are actually created. You could also try disabling reauthentication (reauth=no) to use inline rekeying of the IKE_SAs to see if that changes anything.

#2 Updated by Christian Liebscher over 4 years ago

Hi,

I think I'm observing the same, or at least a similar, issue.

The responder (bob) is configured to wait for an incoming connection (auto=add). Reauth and rekeying is set to default in the responder.
There shouldn't be any reauth or rekeying triggered by the responder. ikelifetime/lifetime is set to 20/10 min at the initiator.

For the config of the initiator (stuart) see the config below. charon.make_before_break is not set.
@Sudheer: You have set make-before-break instead of make_before_break. The latter seems to be correct.

After I start strongswan on both sites, I start pinging from 10.201.0.10 to 10.202.0.10.
StrongSwan brings up the tunnel and the traffic between my clients is working. During IKE reauth something seems to go wrong.
No pings are dropped but I suddenly have two active CHILD_SAs. If I stop the pings all CHILD_SAs will be deleted after their inactivity timeout of 60s.
This behavior was reproducible every time so far.

Initiators (stuart) config:

conn bob_0
  auto = route
  dpdaction = restart
  inactivity = 60s
  esp = aes128ccm16-sha256-modp2048!
  ike = aes128ccm16-sha256-modp2048!
  ikelifetime = 20m
  keyexchange = ikev2
  keyingtries = %forever
  lifetime = 10m
  margintime = 3m
  reqid = 1
  left = %any
  leftauth = pubkey
  leftca = "C=DE L=Karlsruhe O=Minion Inc. OU=Banana CN=ca.minion.inc emailAddress=ca@minion.inc" 
  leftcert = "stuart_Cert.pem" 
  leftsubnet = 10.201.0.0/24
  right = 172.40.2.10
  rightauth = pubkey
  rightca = "C=DE L=Karlsruhe O=Minion Inc. OU=Banana CN=ca.minion.inc emailAddress=ca@minion.inc" 
  rightid = %any
  rightsubnet = 10.202.0.0/24

Before starting the Traffic:

Status of IKE charon daemon (strongSwan 5.3.5, Linux 3.14.22, armv7l):
  uptime: 5 minutes, since Jan 26 14:22:13 2016
  malloc: sbrk 520192, mmap 0, used 159648, free 360544
  worker threads: 9 of 16 idle, 6/0/1/0 working, job queue: 0/0/0/0, scheduled: 0
  loaded plugins: charon aes des rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem fips-prf gmp xcbc cmac hmac ctr ccm gcm attr kerc
Listening IP addresses:
  192.168.10.201
  172.40.1.10
  10.201.0.1
  10.201.1.1
  10.201.2.1
Connections:
       bob_0:  %any...172.40.2.10  IKEv2, dpddelay=30s
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{1}:  ROUTED, TUNNEL, reqid 1
       bob_0{1}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (0 up, 0 connecting):
  none

Then I start pinging:

    16[KNL] creating acquire job for policy 10.201.0.10/32[1/8] === 10.202.0.10/32[1/8] with reqid {1}
    11[IKE] <bob_0|1> initiating IKE_SA bob_0[1] to 172.40.2.10

Until IKE reauth everything seems fine:

Before IKE reauth:

Status of IKE charon daemon (strongSwan 5.3.5, Linux 3.14.22, armv7l):
  uptime: 19 minutes, since Jan 26 14:22:11 2016
  malloc: sbrk 520192, mmap 0, used 178024, free 342168
  worker threads: 9 of 16 idle, 6/0/1/0 working, job queue: 0/0/0/0, scheduled: 5
  loaded plugins: charon aes des rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem fips-prf gmp xcbc cmac hmac ctr ccm gcm attr kerc
Listening IP addresses:
  192.168.10.201
  172.40.1.10
  10.201.0.1
  10.201.1.1
  10.201.2.1
Connections:
       bob_0:  %any...172.40.2.10  IKEv2, dpddelay=30s
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{1}:  ROUTED, TUNNEL, reqid 1
       bob_0{1}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (1 up, 0 connecting):
       bob_0[1]: ESTABLISHED 13 minutes ago, 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc]...172.40.2.10[C=DE, L=Karlsruhe, O=Minion Inc., O]
       bob_0[1]: IKEv2 SPIs: 461fcc2f69bdff9c_i* 42c4ba9616ab3f69_r, public key reauthentication in 41 seconds
       bob_0[1]: IKE proposal: AES_CCM_16_128/PRF_HMAC_SHA2_256/MODP_2048
       bob_0{4}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c890aa1e_i cfba4f65_o
       bob_0{4}:  AES_CCM_16_128, 20832 bytes_i (248 pkts, 0s ago), 20832 bytes_o (248 pkts, 0s ago), rekeying in 38 seconds
       bob_0{4}:   10.201.0.0/24 === 10.202.0.0/24

After IKE Reauth:

Status of IKE charon daemon (strongSwan 5.3.5, Linux 3.14.22, armv7l):
  uptime: 20 minutes, since Jan 26 14:22:12 2016
  malloc: sbrk 520192, mmap 0, used 185688, free 334504
  worker threads: 10 of 16 idle, 6/0/0/0 working, job queue: 0/0/0/0, scheduled: 9
  loaded plugins: charon aes des rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem fips-prf gmp xcbc cmac hmac ctr ccm gcm attr kerc
Listening IP addresses:
  192.168.10.201
  172.40.1.10
  10.201.0.1
  10.201.1.1
  10.201.2.1
Connections:
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{1}:  ROUTED, TUNNEL, reqid 1
       bob_0{1}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (1 up, 0 connecting):
       bob_0[2]: ESTABLISHED 16 seconds ago, 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc]...172.40.2.10[C=DE, L=Karlsruhe, O=Minion Inc., O]
       bob_0[2]: IKEv2 SPIs: 5267ba5bfbb5efd3_i* 389df180cfd19595_r, public key reauthentication in 14 minutes
       bob_0[2]: IKE proposal: AES_CCM_16_128/PRF_HMAC_SHA2_256/MODP_2048
       bob_0{6}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: ccc4a2c8_i cf2d570b_o
       bob_0{6}:  AES_CCM_16_128, 84 bytes_i, 84 bytes_o, rekeying in 4 minutes
       bob_0{6}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{7}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c54c3ef4_i c1f168e4_o
       bob_0{7}:  AES_CCM_16_128, 1176 bytes_i (14 pkts, 1s ago), 1176 bytes_o (14 pkts, 1s ago), rekeying in 5 minutes
       bob_0{7}:   10.201.0.0/24 === 10.202.0.0/24

This is the log from the reauth:

12[KNL] creating rekey job for CHILD_SA ESP/0xc890aa1e/172.40.1.10
12[IKE] <bob_0|1> establishing CHILD_SA bob_0{1}
16[KNL] creating rekey job for CHILD_SA ESP/0xcfba4f65/172.40.2.10
01[IKE] <bob_0|1> CHILD_SA bob_0{5} established with SPIs c5f1c968_i cd535307_o and TS 10.201.0.0/24 === 10.202.0.0/24
01[IKE] <bob_0|1> closing CHILD_SA bob_0{4} with SPIs c890aa1e_i (24108 bytes) cfba4f65_o (24192 bytes) and TS 10.201.0.0/24 === 10.202.0.4
01[IKE] <bob_0|1> sending DELETE for ESP CHILD_SA with SPI c890aa1e
16[IKE] <bob_0|1> received DELETE for ESP CHILD_SA with SPI cfba4f65
16[IKE] <bob_0|1> CHILD_SA closed
14[IKE] <bob_0|1> reauthenticating IKE_SA bob_0[1]
14[IKE] <bob_0|1> deleting IKE_SA bob_0[1] between 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart]
14[IKE] <bob_0|1> sending DELETE for IKE_SA bob_0[1]
10[IKE] <bob_0|1> IKE_SA deleted
10[IKE] <bob_0|1> restarting CHILD_SA bob_0
10[IKE] <bob_0|1> initiating IKE_SA bob_0[2] to 172.40.2.10
16[KNL] creating acquire job for policy 10.201.0.10/32[1/8] === 10.202.0.10/32[1/8] with reqid {1}
12[IKE] <bob_0|2> local host is behind NAT, sending keep alives
12[IKE] <bob_0|2> received cert request for "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=ca.minion.inc, E=ca@minion.inc" 
12[IKE] <bob_0|2> sending cert request for "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=ca.minion.inc, E=ca@minion.inc" 
12[IKE] <bob_0|2> authentication of 'C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc' (myself) withl
12[IKE] <bob_0|2> sending end entity cert "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
12[IKE] <bob_0|2> establishing CHILD_SA bob_0{1}
01[IKE] <bob_0|2> received end entity cert "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=bob.minion.inc, E=bob@minion.inc" 
01[CFG] <bob_0|2>   using certificate "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=bob.minion.inc, E=bob@minion.inc" 
01[CFG] <bob_0|2>   using trusted ca certificate "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=ca.minion.inc, E=ca@minion.inc" 
01[CFG] <bob_0|2> checking certificate status of "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=bob.minion.inc, E=bob@minion.inc" 
01[CFG] <bob_0|2> certificate status is not available
01[CFG] <bob_0|2>   reached self-signed root ca with a path length of 0
01[IKE] <bob_0|2> authentication of 'C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=bob.minion.inc, E=bob@minion.inc' with RSA_EMSA_PKCS1l
01[IKE] <bob_0|2> IKE_SA bob_0[2] established between 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stu]
01[IKE] <bob_0|2> scheduling reauthentication in 897s
01[IKE] <bob_0|2> maximum IKE_SA lifetime 1077s
01[IKE] <bob_0|2> CHILD_SA bob_0{6} established with SPIs ccc4a2c8_i cf2d570b_o and TS 10.201.0.0/24 === 10.202.0.0/24
01[IKE] <bob_0|2> received AUTH_LIFETIME of 10550s, reauthentication already scheduled in 897s
01[IKE] <bob_0|2> peer supports MOBIKE
01[IKE] <bob_0|2> establishing CHILD_SA bob_0{1}
15[IKE] <bob_0|2> CHILD_SA bob_0{7} established with SPIs c54c3ef4_i c1f168e4_o and TS 10.201.0.0/24 === 10.202.0.0/24

And just for fun after the next reauth there is one more CHILD_SAs, adding up to three CHILD_SAs:

Status of IKE charon daemon (strongSwan 5.3.5, Linux 3.14.22, armv7l):
  uptime: 37 minutes, since Jan 26 14:22:11 2016
  malloc: sbrk 520192, mmap 0, used 189376, free 330816
  worker threads: 10 of 16 idle, 6/0/0/0 working, job queue: 0/0/0/0, scheduled: 8
  loaded plugins: charon aes des rc2 sha1 sha2 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem fips-prf gmp xcbc cmac hmac ctr ccm gcm attr kerc
Listening IP addresses:
  192.168.10.201
  172.40.1.10
  10.201.0.1
  10.201.1.1
  10.201.2.1
Connections:
       bob_0:  %any...172.40.2.10  IKEv2, dpddelay=30s
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{1}:  ROUTED, TUNNEL, reqid 1
       bob_0{1}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (1 up, 0 connecting):
       bob_0[3]: ESTABLISHED 2 minutes ago, 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc]...172.40.2.10[C=DE, L=Karlsruhe, O=Minion Inc., OU]
       bob_0[3]: IKEv2 SPIs: 4da25fda87f3da64_i* c64f8e5b3745c04c_r, public key reauthentication in 12 minutes
       bob_0[3]: IKE proposal: AES_CCM_16_128/PRF_HMAC_SHA2_256/MODP_2048
       bob_0{13}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c6a2f8c9_i ca5b253b_o
       bob_0{13}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 17s ago), 84 bytes_o (1 pkt, 3s ago), rekeying in 116 seconds
       bob_0{13}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{14}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c81afe3f_i c6a7d984_o
       bob_0{14}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 17s ago), 168 bytes_o (2 pkts, 3s ago), rekeying in 3 minutes
       bob_0{14}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{15}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c32cf4fd_i c47f190d_o
       bob_0{15}:  AES_CCM_16_128, 11172 bytes_i (133 pkts, 0s ago), 11172 bytes_o (133 pkts, 0s ago), rekeying in 2 minutes
       bob_0{15}:   10.201.0.0/24 === 10.202.0.0/24

I tried setting make_before_break = yes. It had a visible effect. There were no duplicate CHILD_SAs after reauth.

Let me know if I can provide more information to help you locate the problem (If there is any). The behavior doesn't seem correct in my humble opinion. Even if I don't set make_before_break, I would expect the duplicate CHILD_SAs to get dropped after a while. But StrongSwan seems to rekey every single CHILD_SA, even if they are duplicates. I have seen them adding up to 4 CHILD_SAs, so far. I haven't tested longer than that.

BTW: I would consider setting make_before_break, but I'm afraid it could break compatibility with other vendors. Especially because this is a global option. Any thoughts on this? What are the limitations? From the documentation I understand that this is a IKEv2 Feature only, and the peer needs to support it. So it is only active for strongswan <--> strongswan connections?

#3 Updated by Tobias Brunner over 4 years ago

I think I'm observing the same, or at least a similar, issue.

What you see is exactly what I described. During break-before-make reauthentication there is no SA/policy for a short time (on both peers) and traffic that matches the trap policy will trigger an additional SA.

If I stop the pings all CHILD_SAs will be deleted after their inactivity timeout of 60s.

Depending on which SAs are used by each peer (usually the latest will be used) one of the SAs might get deleted due to the inactivity timeouts even when traffic is constantly flowing. Is that the case? Or do the counters of both SAs change after the reauthentication?

And just for fun after the next reauth there is one more CHILD_SAs, adding up to three CHILD_SAs:
[...]

Yes, all SAs are destroyed and recreated. And if the same happens again another SA is created. There is currently no check for duplicate SAs as it could be that they are created intentionally. Not sure if checking for already queued SAs for the same reqid/config when adding a create-child-sa task based on a trap policy might be an option (not sure if we can properly detect this in all situations and whether it could have side-effects).

BTW: I would consider setting make_before_break, but I'm afraid it could break compatibility with other vendors. Especially because this is a global option. Any thoughts on this? What are the limitations? From the documentation I understand that this is a IKEv2 Feature only, and the peer needs to support it. So it is only active for strongswan <--> strongswan connections?

It's possible that other vendors have problems with this (reauthentication is not standardized anyway). But this is basically how it was done with IKEv1 so it's possible that some vendors also support it for IKEv2.

Disabling reauthentication (reauth=no) and just use regular inline IKE_SA rekeying is your best bet to fix this. Do you actually require reauthentication?

#4 Updated by Christian Liebscher over 4 years ago

Tobias Brunner wrote:

I think I'm observing the same, or at least a similar, issue.

What you see is exactly what I described. During break-before-make reauthentication there is no SA/policy for a short time (on both peers) and traffic that matches the trap policy will trigger an additional SA.

If I stop the pings all CHILD_SAs will be deleted after their inactivity timeout of 60s.

Depending on which SAs are used by each peer (usually the latest will be used) one of the SAs might get deleted due to the inactivity timeouts even when traffic is constantly flowing. Is that the case? Or do the counters of both SAs change after the reauthentication?

There is only one host behind each ipsec gateway (10.20x.0.10). Pings are sent constantly (with 1 sec between each ping)(plain linux: "ping 10.202.0.10") from the host behind "stuart" to the host behind "bob". Nothing else. I've been watching the ESP in UDP packets between both gateways. I only see the amount expected.
I just noticed that the packet counters of all CHILD_SAs are incrementing, not only the "latest" CHILD_SA is used obviously. Looks like round robin to me. This might be why none of the CHILD_SAs is getting an inactivity timeout. Who decides which SA to use?

And just for fun after the next reauth there is one more CHILD_SAs, adding up to three CHILD_SAs:
[...]

Yes, all SAs are destroyed and recreated. And if the same happens again another SA is created. There is currently no check for duplicate SAs as it could be that they are created intentionally. Not sure if checking for already queued SAs for the same reqid/config when adding a create-child-sa task based on a trap policy might be an option (not sure if we can properly detect this in all situations and whether it could have side-effects).

Well, in this case they are created unintentionally. As I understand it, in the short moment of the reauth, strongswan gets an request from the kernel. If that is the case, how about ignoring or deferring the request during reauth?

BTW: I would consider setting make_before_break, but I'm afraid it could break compatibility with other vendors. Especially because this is a global option. Any thoughts on this? What are the limitations? From the documentation I understand that this is a IKEv2 Feature only, and the peer needs to support it. So it is only active for strongswan <--> strongswan connections?

It's possible that other vendors have problems with this (reauthentication is not standardized anyway). But this is basically how it was done with IKEv1 so it's possible that some vendors also support it for IKEv2.

Disabling reauthentication (reauth=no) and just use regular inline IKE_SA rekeying is your best bet to fix this. Do you actually require reauthentication?

Good question. So far I didn't get the difference. I was just sticking to the default, which is yes. I'm doing a custom configuration backend for strongswan. I have a reasonable amount of options visible to the costumer. To be honest, I have no good idea, what they do with it. Most costumers buy and never tell us anything about what they do. So I can't give you an exact answer.

As far as I know most costumers still use IKEv1, because they want to connect to different vendors, like Cisco. We strongly recommend to them to use IKEv2 if possible, and use IKEv1 only for backwards compatibility. So the normal use case is probably a mix of IKEv1 and IKEv2.
Do you think IKEv1 is also affected by this issue? Looks like I can't deactivate reauth for IKEv1.

reauth = yes | no

whether rekeying of an IKE_SA should also reauthenticate the peer. In IKEv1, reauthentication is always done.
In IKEv2, a value of no rekeys without uninstalling the IPsec SAs, a value of yes (the default)
creates a new IKE_SA from scratch and tries to recreate all IPsec SAs.

#5 Updated by Christian Liebscher over 4 years ago

Some more information:

I tested the same setup using IKEv1. Seems like the problem does not exist for IKEv1. After IKE reauth you see an additional "REKEYED" CHILD_SA / Quick Mode, which timeouts after a while:

Connections:
       bob_0:  %any...172.40.2.10  IKEv1, dpddelay=30s
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    ca:    "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=ca.minion.inc, E=ca@minion.inc" 
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{2893}:  ROUTED, TUNNEL, reqid 1
       bob_0{2893}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (1 up, 0 connecting):
       bob_0[71]: ESTABLISHED 58 seconds ago, 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc]...172.40.2.10[C=DE, L=Karlsruhe, O=Minion Inc., ]
       bob_0[71]: IKEv1 SPIs: 4933c2dd29dbb65e_i* a620060fd245126d_r, public key reauthentication in 15 minutes
       bob_0[71]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
       bob_0{2901}:  REKEYED, TUNNEL, reqid 1, expires in 3 minutes
       bob_0{2901}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2902}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: ce2111d3_i c3302174_o
       bob_0{2902}:  AES_CCM_16_128, 12012 bytes_i (143 pkts, 1s ago), 12012 bytes_o (143 pkts, 1s ago), rekeying in 115 seconds
       bob_0{2902}:   10.201.0.0/24 === 10.202.0.0/24

FYI: I left the devices running over night with IKEv2 and this is what it looks like. Please note the Packet counters:

Connections:
       bob_0:  %any...172.40.2.10  IKEv2, dpddelay=30s
       bob_0:   local:  [C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc] uses public key authentication
       bob_0:    cert:  "C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc" 
       bob_0:   remote: uses public key authentication
       bob_0:   child:  10.201.0.0/24 === 10.202.0.0/24 TUNNEL, dpdaction=restart
Routed Connections:
       bob_0{1}:  ROUTED, TUNNEL, reqid 1
       bob_0{1}:   10.201.0.0/24 === 10.202.0.0/24
Security Associations (1 up, 0 connecting):
       bob_0[67]: ESTABLISHED 3 minutes ago, 172.40.1.10[C=DE, L=Karlsruhe, O=Minion Inc., OU=Banana, CN=stuart.minion.inc, E=stuart@minion.inc]...172.40.2.10[C=DE, L=Karlsruhe, O=Minion Inc., O]
       bob_0[67]: IKEv2 SPIs: ddec4d4c1c0259af_i* 44351e55bff00d21_r, public key reauthentication in 10 minutes
       bob_0[67]: IKE proposal: AES_CCM_16_128/PRF_HMAC_SHA2_256/MODP_2048
       bob_0{2880}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c73a4b14_i c3b1a779_o
       bob_0{2880}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 168 bytes_o (2 pkts, 14s ago), rekeying in 18 seconds
       bob_0{2880}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2881}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c0c4d2ba_i ce201fbd_o
       bob_0{2881}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 84 bytes_o (1 pkt, 14s ago), rekeying in 31 seconds
       bob_0{2881}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2882}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c93e817f_i c291c3b6_o
       bob_0{2882}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 168 bytes_o (2 pkts, 14s ago), rekeying in 111 seconds
       bob_0{2882}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2883}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: cfa5dcce_i c9826db1_o
       bob_0{2883}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 84 bytes_o (1 pkt, 14s ago), rekeying in 98 seconds
       bob_0{2883}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2884}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c940a6d0_i c290c19b_o
       bob_0{2884}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 168 bytes_o (2 pkts, 14s ago), rekeying in 85 seconds
       bob_0{2884}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2885}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c13e0c5e_i ce3be375_o
       bob_0{2885}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 84 bytes_o (1 pkt, 14s ago), rekeying in 71 seconds
       bob_0{2885}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2886}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: ca691b74_i c05cf91a_o
       bob_0{2886}:  AES_CCM_16_128, 420 bytes_i (5 pkts, 23s ago), 504 bytes_o (6 pkts, 14s ago), rekeying in 45 seconds
       bob_0{2886}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2887}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c2aac690_i cdb14076_o
       bob_0{2887}:  AES_CCM_16_128, 84 bytes_i (1 pkt, 23s ago), 84 bytes_o (1 pkt, 14s ago), rekeying in 95 seconds
       bob_0{2887}:   10.201.0.0/24 === 10.202.0.0/24
       bob_0{2888}:  INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c0b5c1c8_i c7e0112c_o
       bob_0{2888}:  AES_CCM_16_128, 18060 bytes_i (215 pkts, 0s ago), 18060 bytes_o (215 pkts, 0s ago), rekeying in 25 seconds
       bob_0{2888}:   10.201.0.0/24 === 10.202.0.0/24

So the duplication stops at some time. I don't think that reauth is broken in a serious way. The behavior seems logical to me. What does not agree with me is that "unused" CHILD_SAs don't timeout correctly. Why do I have usage of CHILD_SA 2880 to 2887 if 2888 is the newest CHILD_SA? Any ideas?

Thanks in advance, let me know if I can supply more information.

#6 Updated by Tobias Brunner over 4 years ago

I just noticed that the packet counters of all CHILD_SAs are incrementing, not only the "latest" CHILD_SA is used obviously. Looks like round robin to me. This might be why none of the CHILD_SAs is getting an inactivity timeout. Who decides which SA to use?

That's an interesting issue that's actually quite easily explained, however, not really fixable. While the traffic counters (packets/bytes) are queried from the SAs, the last use time is queried from the policies (the use time of an SA refers to the time it was first used). And since the policy is shared by all SAs they all share the same use time (the packet counter should not increase though after a new SA has been established).

Well, in this case they are created unintentionally. As I understand it, in the short moment of the reauth, strongswan gets an request from the kernel. If that is the case, how about ignoring or deferring the request during reauth?

Yes, that's what I meant with (however, deferring would not help):

Not sure if checking for already queued SAs for the same reqid/config when adding a create-child-sa task based on a trap policy might be an option (not sure if we can properly detect this in all situations and whether it could have side-effects).

Disabling reauthentication (reauth=no) and just use regular inline IKE_SA rekeying is your best bet to fix this. Do you actually require reauthentication?

Good question. So far I didn't get the difference.

See RFC 7296, section 2.8.3 (which actually describes make-before-break reauthentication).

Do you think IKEv1 is also affected by this issue?

No, IKEv1 always uses make-before-break reauthentication to rekey IKE_SAs (but does not recreate CHILD_SAs).

#7 Updated by Noel Kuntze over 3 years ago

  • Status changed from Feedback to Closed
  • Resolution set to No feedback

Also available in: Atom PDF