Project

General

Profile

Issue #1103

Stuck with rekeying active

Added by Ivan Sidorov over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
freebsd
Affected version:
5.3.2
Resolution:
No change required

Description

I'm still fighting with Strongswan client on FreeBSD 10.2. I hope, this one is not firewall issue.

Client configs are from #1100, they work, but after some hours (for rekey = 1h it was 8 hour the last time), tunnel dies. According status it stuck with rekeying active message. I checked time settings and they look good, with UTC timezone. (actually, clocks have 1 sec difference, but that should be ok).

I changed rekey interval to 11m, so according default strongswan policy rekey is started each minute. I recreated issue pretty fast, on the second rekey. With ike=2 and cfg=2 logging options I see no messages about attempt (or failed attempt) to rekey and tunnel goes down after 10 min.

Sep  9 20:52:35 abinet charon: 11[IKE] establishing CHILD_SA abinet{1}
Sep  9 20:52:35 abinet charon: 11[IKE] received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding
Sep  9 20:52:35 abinet charon: 11[IKE] CHILD_SA abinet{7} established with SPIs ce3ff9e5_i caac9378_o and TS 192.168.0.0/24 === 10.0.0.0/16
Sep  9 20:52:35 abinet charon: 11[IKE] closing CHILD_SA abinet{6} with SPIs c9802432_i (8500 bytes) cecd69d0_o (13080 bytes) and TS 192.168.0.0/24 === 10.0.0.0/16
...skipping...
Sep 10 07:53:44 abinet charon: 12[IKE] queueing IKE_DPD task
Sep 10 07:53:44 abinet charon: 12[IKE] activating new tasks
Sep 10 07:53:44 abinet charon: 12[IKE]   activating IKE_DPD task
Sep 10 07:53:44 abinet charon: 12[IKE] activating new tasks
Sep 10 07:53:44 abinet charon: 12[IKE] nothing to initiate
... and similar messages ....

I think, after SA expiration (24h), it restarts eventually, but that is cold comfort, obviously. The 2 ciscos connected to the same FreeBSD server rekeys normally. How can I investigate issue more thoroughly ?

client.txt (11.7 KB) client.txt client logs Ivan Sidorov, 10.09.2015 12:24
logs.txt (3.25 KB) logs.txt Ivan Sidorov, 10.09.2015 13:17
client2.txt (8.67 KB) client2.txt Ivan Sidorov, 10.09.2015 13:17

Related issues

Related to Issue #951: p2 SA are not deleted on FreeBSDClosed11.05.2015

History

#1 Updated by Ivan Sidorov over 4 years ago

Actually, configs are not the same, I changed both to auto=route, and rekey=no was added to the server to prevent possible race on simultaneously rekey.

Client:

conn abinet
        keyexchange=ikev2
        authby=pubkey
        left=9x.x.x.x
        leftid="xx" 
        leftcert=ipsec-client-cert.pem
        leftsubnet=192.168.0.0/24
        ike = aes256-sha1-prfsha1-modp2048
        esp = aes256-sha1
        right=8x.x.x.x
        rightid="xx" 
        rightsubnet=10.0.0.0/16
        ikelifetime = 24h
        lifetime = 11m
        rekeyfuzz = 0%
        dpdaction=clear
        dpdtimeout=120s
        auto=route

#2 Updated by Tobias Brunner over 4 years ago

  • Category changed from kernel to libcharon
  • Status changed from New to Feedback

tunnel dies

What does that mean? Are the SAs still installed in the kernel? (Check with setkey)

According status it stuck with rekeying active message

As I wrote in #1085 the stroke plugin simply compares the configured rekey time with the install time of the SA and if the time is exceeded it assumes the SA is currently being rekeyed. But it does not mean a rekeying is actually active. However, that the SA is still there could indicate a problem during the rekeying. Please check the log for such problems (the excerpt you posted above shows a successful rekeying).

I changed rekey interval to 11m, so according default strongswan policy rekey is started each minute. I recreated issue pretty fast, on the second rekey.

What exactly happens at that point? Could you post the complete log?

With ike=2 and cfg=2 logging options I see no messages about attempt (or failed attempt) to rekey and tunnel goes down after 10 min.

What does "goes down" mean? What is logged at that time?

How can I investigate issue more thoroughly ?

Look at the logs and the kernel state (setkey) to see what actually happens during the rekeyings and when you think the tunnel "goes down".

#3 Updated by Ivan Sidorov over 4 years ago

Tobias Brunner wrote:

What does that mean? Are the SAs still installed in the kernel? (Check with setkey)

No traffic come through it.

What exactly happens at that point? Could you post the complete log?

According https://wiki.strongswan.org/projects/strongswan/wiki/ExpiryRekey , with disabled rekeyfuzz if we set lifetime to 10m, strongswan starts to initiate rekey in 60 sec.
So, we are up, tunnel is working

abinet{1}:  AES_CBC_256/HMAC_SHA1_96, 0 bytes_i, 0 bytes_o, rekeying in 57 seconds

Time is passed and SA_CHILD rekeys
abinet{2}:  AES_CBC_256/HMAC_SHA1_96, 0 bytes_i, 0 bytes_o, rekeying in 42 seconds

Another minute passed
abinet{2}:  AES_CBC_256/HMAC_SHA1_96, 208 bytes_i (2 pkts, 87s ago), 304 bytes_o (2 pkts, 11s ago), rekeying active

stroke plugin assumes, the rekey time arrived, but there are no rekey in process. The tunnel works.
9xxxxxxxx 8xxxxxxxx
        esp mode=tunnel spi=3391912137(0xca2c78c9) reqid=1(0x00000001)
        E: rijndael-cbc  9d04e496 026daabd efe14d5c c877cb8a 9d31e6b9 ecd625bc 85d90d4a 0b1c1ea1
        A: hmac-sha1  a1e212d7 b5756bab 35840217 dc591320 6ad8736c
        seq=0x000000c5 replay=4 flags=0x00000000 state=mature
        created: Sep 10 10:05:00 2015   current: Sep 10 10:14:46 2015
        diff: 586(s)    hard: 600(s)    soft: 0(s)
        last: Sep 10 10:14:46 2015      hard: 0(s)      soft: 0(s)
        current: 29944(bytes)   hard: 0(bytes)  soft: 0(bytes)
        allocated: 197  hard: 0 soft: 0
        sadb_seq=1 pid=48415 refcnt=2
8xxxxxxxx 9xxxxxxxxxxxx
        esp mode=tunnel spi=3402157810(0xcac8cef2) reqid=1(0x00000001)
        E: rijndael-cbc  74c1fa14 755ef43a 633a650d d52f234a 02ebfc5f a09fd7d2 4101e211 00a20f1a
        A: hmac-sha1  b0111360 aeed9b46 1d11c3b9 bcad8f2e 19f34265
        seq=0x000000c5 replay=4 flags=0x00000000 state=dying
        created: Sep 10 10:05:00 2015   current: Sep 10 10:14:46 2015
        diff: 586(s)    hard: 600(s)    soft: 60(s)
        last: Sep 10 10:14:46 2015      hard: 0(s)      soft: 0(s)
        current: 20488(bytes)   hard: 0(bytes)  soft: 0(bytes)
        allocated: 197  hard: 0 soft: 0
        sadb_seq=0 pid=48415 refcnt=1

After 10 min SA_CHILD is dead and removed.
9xxxxxxxxxxx 8xxxxxxxxxxxx
        esp mode=tunnel spi=3391912137(0xca2c78c9) reqid=1(0x00000001)
        E: rijndael-cbc  9d04e496 026daabd efe14d5c c877cb8a 9d31e6b9 ecd625bc 85d90d4a 0b1c1ea1
        A: hmac-sha1  a1e212d7 b5756bab 35840217 dc591320 6ad8736c
        seq=0x000000d3 replay=4 flags=0x00000000 state=mature
        created: Sep 10 10:05:00 2015   current: Sep 10 10:15:35 2015
        diff: 635(s)    hard: 600(s)    soft: 0(s)
        last: Sep 10 10:15:00 2015      hard: 0(s)      soft: 0(s)
        current: 32072(bytes)   hard: 0(bytes)  soft: 0(bytes)
        allocated: 211  hard: 0 soft: 0
        sadb_seq=0 pid=48499 refcnt=2

So, according logs and setkey no attempt to rekey after the first rekey succeeded

#4 Updated by Tobias Brunner over 4 years ago

Please set the default log level to 1. And try to collect setkey output of all SAs (i.e. the initial one and after each rekeying).

#5 Updated by Ivan Sidorov over 4 years ago

Sure.
Observation: no packets was issued through the tunnel during the last 60 sec interval b/w rekeys. So, it was like the tunnel wants to die due to inactivity, but after stroke suggestion to rekey, I started pings from both sides, but that not help - SA was expired.

Maybe that's why I hit the issue - after #1100 I disabled cron pings and here is result - with no traffic till stroke rekey suggestion my tunnels dies.

#6 Updated by Tobias Brunner over 4 years ago

Why does your log only show messages from the ike subsystem?

But I remember now what the problem is. FreeBSD 10 only sends an expire message to the daemon if there was traffic on an SA. I actually wrote a patch for this myself a few months ago (see FreeBSD bug 200283). The patch will be included in FreeBSD 11 and it seems it was backported to 10 too (but you probably need to update your kernel sources). As a workaround creating some traffic should work too.

#7 Updated by Ivan Sidorov over 4 years ago

Tobias Brunner wrote:

Why does your log only show messages from the ike subsystem?

Err, I though the relevant information for this issue will be contained in ike subsystem.

But I remember now what the problem is. FreeBSD 10 only sends an expire message to the daemon if there was traffic on an SA. I actually wrote a patch for this myself a few months ago (see FreeBSD bug 200283). The patch will be included in FreeBSD 11 and it seems it was backported to 10 too (but you probably need to update your kernel sources). As a workaround creating some traffic should work too.

Looks like the patch was submitted after code freeze for 10.2-RELENG, so we have to wait for 10.3. The good thing the patch looks trivial enough not to wait, but try it at home :-) Would you like to receive the feedback here?
BTW, ok, even if FreeBSD is not informing daemon about soft timer expiration, we have an another one. Why strongswan not cleared SA after hard timer expiration?

#8 Updated by Tobias Brunner over 4 years ago

  • Related to Issue #951: p2 SA are not deleted on FreeBSD added

#9 Updated by Tobias Brunner over 4 years ago

Why does your log only show messages from the ike subsystem?

Err, I though the relevant information for this issue will be contained in ike subsystem.

Others would have helped, e.g. knl to see if and when the daemon received an expire event.

But I remember now what the problem is. FreeBSD 10 only sends an expire message to the daemon if there was traffic on an SA. I actually wrote a patch for this myself a few months ago (see FreeBSD bug 200283). The patch will be included in FreeBSD 11 and it seems it was backported to 10 too (but you probably need to update your kernel sources). As a workaround creating some traffic should work too.

Looks like the patch was submitted after code freeze for 10.2-RELENG, so we have to wait for 10.3. The good thing the patch looks trivial enough not to wait, but try it at home :-) Would you like to receive the feedback here?

Sure, if it fixes the problem we can close this.

Why strongswan not cleared SA after hard timer expiration?

Because FreeBSD didn't send an expire event for that. I sent a patch for that too: 200282. I also referenced the issue that spawned the two patches.

#10 Updated by Ivan Sidorov over 4 years ago

Tobias Brunner wrote:

Sure, if it fixes the problem we can close this.

I can confirm soft and hard timers work with updated ipsec stack.

#11 Updated by Tobias Brunner over 4 years ago

  • Category changed from libcharon to freebsd
  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Resolution set to No change required

OK, thanks for the feedback.

Also available in: Atom PDF