Project

General

Profile

Bug #2902

IKEv1: QM rekey collision results in uneven updown events

Added by Daniel Gollub 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
ikev1
Target version:
Start date:
Due date:
Estimated time:
Affected version:
5.7.2
Resolution:
Fixed

Description

First of all, the IKEv2 implementation works great, still there are some deployments not able to migrate away from IKEv1 yet. Primarly due to some service providers providing IKEv1 only peers.

We spotted following situation in the wild with IKEv1:

0. locally charon.delete_rekeyed = no, remote (third-party impl.) which deletes redundant Child SAs (similar as charon.delete_rekeyed = yes)
1. initial MM and QM setup results in no redundant QMs: {1}
2. later, QM rekey results in redundant QMs ({2} and {3}) due to QM rekey collision
3. rekey-time for the "older" redundant QM {2} is up and gets the state CHILD_REKEYED to not trigger an updown down event, no deletion yet of the Child SA
4. remote peer sends DELETE for QM {3}, because the peer decided to delete the "other" redundant Child SA (e.g. bad luck with the jitter)
5. deletion of QM {3} results in updown down event and eventually brings down the entire tunnel

2. QM rekey results in redundant QMs

Peer                 Local

{X2}                 {2} CHILD_INSTALLED
{X3}                 {3} CHILD_INSTALLED

3. local rekey attempt of {2} spots redundant Child SA, no rekey gets queued:

Peer                 Local

{X2}                 {2} CHILD_REKEYED (charon.delete_rekeyed=no, no deletion)
{X3}                 {3} CHILD_INSTALLED

4. remote peer decided to delete the other redundant Child SA:

Peer                 Local

{X2}                 {2} CHILD_REKEYED
{X3}  --QM DELETE--> {3} CHILD_INSTALLED --child_updown down--> ...

Associated revisions

Revision 451c2e7d
Added by Tobias Brunner 3 months ago

Merge branch 'ikev1-redundant-updown'

Avoids calling updown script for redundant CHILD_SAs after IKEv1 rekey
collisions.

Fixes #2902.

History

#2 Updated by Tobias Brunner 5 months ago

  • Category set to ikev1
  • Status changed from New to Feedback

0. locally charon.delete_rekeyed = no, remote (third-party impl.) which deletes redundant Child SAs (similar as charon.delete_rekeyed = yes)

Is there a different if you change that setting to yes?

1. initial MM and QM setup results in no redundant QMs: {1}

Why should it. Or are both peers creating SAs concurrently?

2. later, QM rekey results in redundant QMs ({2} and {3}) due to QM rekey collision

Well, that's IKEv1 for you. There is no collision handling in the protocol. Maybe change the rekey settings so it's always the same host that initiates the rekeyings. But you say there is a rekey collision, but why are there only two CHILD_SAs? And both installed? That seems strange.

3. rekey-time for the "older" redundant QM {2} is up and gets the state CHILD_REKEYED to not trigger an updown down event, no deletion yet of the Child SA

If there was a rekey collision above, that should have happened then (and there should be more CHILD_SAs). Could you provide the complete logs of this session?

4. remote peer sends DELETE for QM {3}, because the peer decided to delete the "other" redundant Child SA (e.g. bad luck with the jitter)

On what is that decision based?

#3 Updated by Daniel Gollub 5 months ago

Tobias Brunner wrote:

0. locally charon.delete_rekeyed = no, remote (third-party impl.) which deletes redundant Child SAs (similar as charon.delete_rekeyed = yes)

Is there a different if you change that setting to yes?

I need to give this a try. But in this particular case the remote peer is not under my control. The remote peer is either running an old strongswan version which deletes redundant Child SAs by default, prior the charon.delete_rekeyed switch was introduced. Or charon.delete_rekeyed=yes was set on purposes.

Regardless of that charon.delete_rekeyed setting is not identical my guess is that with charon.delete_rekeyed=yes on both sides a similar problem can be triggered:

The IKEv1 queue rekey function is triggering a deletion job for a redundant Child SA, with charon.delete_rekeyed=yes, the peer runs the quick_delete code for that Child SA, which was not yet marked as rekeyed. The responder quick_delete code is not checking for redundant Child SAs and triggers the updown hook, since the to be delete Child SA is not identified as rekeyed or redundant Child SA yet.

This situation only happens if:

  1. at least one peer is set to charon.delete_rekeyed = yes
  2. QM rekey results in redundant Child SAs
  3. rekey_time (incl. jitter) expires early enough that the delete_rekeyed QM request is received before the remote peers rekey_time (incl. jitter) expires (redundant Child SA not yet identified)

... or in following situation:

  1. only one peer has charon.delete_rekeyed = yes set (or an older strongswan version, or different implementation with similar behavior)
  2. QM rekey results in redundant Child SAs
  3. rekey_time (incl. jitter) expires for Child SA {2} on peer with charon.delete = no, marked as CHILD_REKEYED, no QM deletion send
  4. rekey_time (incl. jitter) expires for Child SA {X3} on peer with charon.delete = yes, marked as CHILD_REKEYED and sends QM deletion request. Remote-peer triggers down event. (That's the peer decides the "other" redundant Child SA)

1. initial MM and QM setup results in no redundant QMs: {1}

Why should it. Or are both peers creating SAs concurrently?

It shouldn't. I just stated that to clarify that initially no duplicate/redundant Child SA were in place.

2. later, QM rekey results in redundant QMs ({2} and {3}) due to QM rekey collision

Well, that's IKEv1 for you. There is no collision handling in the protocol.

Yeah, I'm aware of that absence of collision handling in IKEv1. It hurts every time when I need to handle such cases ... unfrotunately often the remote end is run by someone else and they can't enable/support IKEv2 for some reason.

Maybe change the rekey settings so it's always the same host that initiates the rekeyings.

Yes, that is usually the workaround we ask or try to apply in such cases.

But you say there is a rekey collision, but why are there only two CHILD_SAs? And both installed? That seems strange.

I thought that two CHILD_SAs and both installed is the undesired result of the absence of QM rekey collision in IKEv1. And that was the recent to introduce the is_redundant() check in task_manager_v1.c to sort avoid unnecessary updown triggers if a redundant CHILD_SA gets deleted.

3. rekey-time for the "older" redundant QM {2} is up and gets the state CHILD_REKEYED to not trigger an updown down event, no deletion yet of the Child SA

If there was a rekey collision above, that should have happened then (and there should be more CHILD_SAs). Could you provide the complete logs of this session?

See logs below. In total those logs capture four CHILD_SAs. Initially there is one INSTALLED and one REKEYED which is going to expire. Few minutes later QM rekey happens, simultaneously (bad luck with the jitter - that is very very spurious), which result in two new INSTALLED CHILD_SAs and the one which was meant to be rekeyed gets set as REKEYED as expected.

4. remote peer sends DELETE for QM {3}, because the peer decided to delete the "other" redundant Child SA (e.g. bad luck with the jitter)

On what is that decision based?

My guess is that this is based on the Child SA rekey_time (incl. jitter) and the is_redundant() function which marked the Child SA as REKEYED, and if configured (charon.delete_rekeyed) deletes it right away, if there is a fresher instance.

commit 1e24fa4614d810d1b8763513335c54dd25aa03c6
Author: Martin Willi <martin@revosec.ch>
Date:   Tue Jun 5 15:03:10 2012 +0200

    Instead of rekeying, delete a quick mode if we have a fresher instance

    If both peers initiate quick mode rekeying simultaneously, we end up
    with duplicate SAs for a configuration. This can't be avoided, nor do
    the standards provide an appropriate solution. Instead of closing one
    SA immediately, we keep both. But once rekeying triggers, we don't
    refresh the SA with the shorter soft lifetime, but delete it.

IIUC in IKEv1 lifetime is negotiated and the individual jitter/margins get added by the peers. So the assumption what is the older and what is the fresher instance is only true from a local perspective. For that reason I think even if charon.rekeyed_delete is set on both sides this might result in additional, undesired, updown events.

Log:

# Initially following Child SAs exists for this IKE SA: {104495}, {104603}

# {104495} expired, deleted by the peer
Jan 29 22:08:23 dut charon[43129]: 07[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (76 bytes)
Jan 29 22:08:23 dut charon[43129]: 07[ENC] <peer-192.0.2.1|288860> parsed INFORMATIONAL_V1 request 2596049900 [ HASH D ]
Jan 29 22:08:23 dut charon[43129]: 07[IKE] <peer-192.0.2.1|288860> received DELETE for ESP CHILD_SA with SPI c64c6786
Jan 29 22:08:23 dut charon[43129]: 07[IKE] <peer-192.0.2.1|288860> closing CHILD_SA peer-192.0.2.1{104495} with SPIs cde9671b_i (0 bytes) c64c6785_o (0 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0

# Simultaneous QM rekey starting here:

# rekeying of {104603}
Jan 29 22:35:12 dut charon[43129]: 12[KNL] creating rekey job for CHILD_SA ESP/0xcd1723fa/192.0.2.1
Jan 29 22:35:12 dut charon[43129]: 12[ENC] <peer-192.0.2.1|288860> generating QUICK_MODE request 2308959688 [ HASH SA No KE ID ID ] 
Jan 29 22:35:12 dut charon[43129]: 12[NET] <peer-192.0.2.1|288860> sending packet: from 192.0.2.2[500] to 192.0.2.1[500] (380 bytes)

# rekeying of {104603}, again!
Jan 29 22:35:12 dut charon[43129]: 05[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (380 bytes)
Jan 29 22:35:12 dut charon[43129]: 05[ENC] <peer-192.0.2.1|288860> parsed QUICK_MODE request 915742311 [ HASH SA No KE ID ID ]
Jan 29 22:35:12 dut charon[43129]: 05[IKE] <peer-192.0.2.1|288860> detected rekeying of CHILD_SA peer-192.0.2.1{104603}
Jan 29 22:35:12 dut charon[43129]: 05[ENC] <peer-192.0.2.1|288860> generating QUICK_MODE response 915742311 [ HASH SA No KE ID ID ]
Jan 29 22:35:12 dut charon[43129]: 05[NET] <peer-192.0.2.1|288860> sending packet: from 192.0.2.2[500] to 192.0.2.1[500] (380 bytes)

# rekeying of {104603}, results in {104711}, no UP event! Since this is a rekey.
Jan 29 22:35:12 dut charon[43129]: 13[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (380 bytes)
Jan 29 22:35:12 dut charon[43129]: 13[ENC] <peer-192.0.2.1|288860> parsed QUICK_MODE response 2308959688 [ HASH SA No KE ID ID ] 
Jan 29 22:35:12 dut charon[43129]: 13[IKE] <peer-192.0.2.1|288860> CHILD_SA peer-192.0.2.1{104711} established with SPIs cf42a2b3_i c796f846_o and TS 0.0.0.0/0 === 0.0.0.0/0
Jan 29 22:35:12 dut charon[43129]: 13[ENC] <peer-192.0.2.1|288860> generating QUICK_MODE request 2308959688 [ HASH ]
Jan 29 22:35:12 dut charon[43129]: 13[NET] <peer-192.0.2.1|288860> sending packet: from 192.0.2.2[500] to 192.0.2.1[500] (60 bytes)

# rekeying of {104603}, again! results in {104712}, no UP event! Since this is a rekey.
Jan 29 22:35:12 dut charon[43129]: 12[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (60 bytes)
Jan 29 22:35:12 dut charon[43129]: 12[ENC] <peer-192.0.2.1|288860> parsed QUICK_MODE request 915742311 [ HASH ]
Jan 29 22:35:12 dut charon[43129]: 12[IKE] <peer-192.0.2.1|288860> CHILD_SA peer-192.0.2.1{104712} established with SPIs ce48a336_i cfc7c9c4_o and TS 0.0.0.0/0 === 0.0.0.0/0

# End of simultaneous QM rekey

# peer deletes old/expired {104603}
Jan 29 22:51:50 dut charon[43129]: 04[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (76 bytes)
Jan 29 22:51:50 dut charon[43129]: 04[ENC] <peer-192.0.2.1|288860> parsed INFORMATIONAL_V1 request 2828014184 [ HASH D ]
Jan 29 22:51:50 dut charon[43129]: 04[IKE] <peer-192.0.2.1|288860> received DELETE for ESP CHILD_SA with SPI cd1723fa
Jan 29 22:51:50 dut charon[43129]: 04[IKE] <peer-192.0.2.1|288860> closing CHILD_SA peer-192.0.2.1{104603} with SPIs c46e8cc2_i (0 bytes) cd1723fa_o (0 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0

# peer deletes redundant {104712} (peer: charon.delete_rekeyed=yes), but {104711} still exists. Expected behavior: {104711} is still used. Actual behavior: updown down event is triggered.
Jan 29 23:17:41 dut charon[43129]: 07[NET] <peer-192.0.2.1|288860> received packet: from 192.0.2.1[500] to 192.0.2.2[500] (76 bytes)
Jan 29 23:17:41 dut charon[43129]: 07[ENC] <peer-192.0.2.1|288860> parsed INFORMATIONAL_V1 request 1476600040 [ HASH D ]
Jan 29 23:17:41 dut charon[43129]: 07[IKE] <peer-192.0.2.1|288860> received DELETE for ESP CHILD_SA with SPI cfc7c9c4
Jan 29 23:17:41 dut charon[43129]: 07[IKE] <peer-192.0.2.1|288860> closing CHILD_SA peer-192.0.2.1{104712} with SPIs ce48a336_i (0 bytes) cfc7c9c4_o (0 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0
Jan 29 23:17:41 dut updown-hook[41951]: DOWN 192.0.2.1 0.0.0.0/0 == 192.0.2.1 -- 192.0.2.2 == 0.0.0.0/0

IMHO this can be fixed if the quick_delete.c implementation gets also prepared to deal with redundant Child SAs as the queue_child_rekey function is. And by not making assumption what is the fresher/older Child SA instance in a remote context. And prepare for the case that the peer might delete redundant Child SAs, even if the local configuration would not.

#4 Updated by Tobias Brunner 5 months ago

  • Tracker changed from Issue to Bug
  • Target version set to 5.8.0

The remote peer is either running an old strongswan version which deletes redundant Child SAs by default, prior the charon.delete_rekeyed switch was introduced. Or charon.delete_rekeyed=yes was set on purposes.

This option is disabled by default, deleting the rekeyed CHILD_SAs was something that was added later (however, very old versions, i.e. < 5.0.0, might have deleted rekeyed SAs, no idea).

... or in following situation:

  1. only one peer has charon.delete_rekeyed = yes set (or an older strongswan version, or different implementation with similar behavior)
  2. QM rekey results in redundant Child SAs
  3. rekey_time (incl. jitter) expires for Child SA {2} on peer with charon.delete = no, marked as CHILD_REKEYED, no QM deletion send
  4. rekey_time (incl. jitter) expires for Child SA {X3} on peer with charon.delete = yes, marked as CHILD_REKEYED and sends QM deletion request. Remote-peer triggers down event. (That's the peer decides the "other" redundant Child SA)

What are {2} and {X3}? How are they related?

Maybe change the rekey settings so it's always the same host that initiates the rekeyings.

Yes, that is usually the workaround we ask or try to apply in such cases.

And why not in this case? And why do you use jitter if it causes problems?

But you say there is a rekey collision, but why are there only two CHILD_SAs? And both installed? That seems strange.

I thought that two CHILD_SAs and both installed is the undesired result of the absence of QM rekey collision in IKEv1. And that was the recent to introduce the is_redundant() check in task_manager_v1.c to sort avoid unnecessary updown triggers if a redundant CHILD_SA gets deleted.

The is_redundant() check is to avoid rekeying redundant SAs (it's called in queue_child_rekey() after all), that has nothing to do with the updown events. The state change to CHILD_REKEYED afterwards, however, is to avoid triggering the updown event (#1421).

4. remote peer sends DELETE for QM {3}, because the peer decided to delete the "other" redundant Child SA (e.g. bad luck with the jitter)

On what is that decision based?

My guess is that this is based on the Child SA rekey_time (incl. jitter) and the is_redundant() function which marked the Child SA as REKEYED, and if configured (charon.delete_rekeyed) deletes it right away, if there is a fresher instance.

charon.delete_rekeyed causes rekeyed SAs to get deleted, not redundant ones (except when the SA is rekeyed next). So I don't think this is directly related. And according to the log it deleted the CHILD_SA that was created later (at least on this end). "fresher" is no absolute property of these SAs, so deleting redundant SAs proactively based on this, like the peer here might do is tricky (if both did this, they might end up without any SAs).

IIUC in IKEv1 lifetime is negotiated

Charon ignores anything the peer sends, it only uses its locally configured lifetimes.

IMHO this can be fixed if the quick_delete.c implementation gets also prepared to deal with redundant Child SAs as the queue_child_rekey function is. And by not making assumption what is the fresher/older Child SA instance in a remote context. And prepare for the case that the peer might delete redundant Child SAs, even if the local configuration would not.

Yes, checking for redundant CHILD_SAs in the quick-delete task probably makes sense. But there are some issues with the proposed patch e.g. have_equal_ts doesn't really work as generic utility function because it only compares the first TS, which is generally fine for IKEv1, but not for IKEv2, even with delete_rekeyed disabled should the state be changed to REKEYED (otherwise, the updown event is triggered when the hard lifetime expires), the check in the quick delete task doesn't ignore the searched CHILD_SA (the other check did so implicitly by hard-comparing the rekey times), and not only should the updown event be disabled, so should the close action.

I pushed alternative patches to the 2902-ikev1-redundant-delete branch.

#5 Updated by Daniel Gollub 5 months ago

Tobias Brunner wrote:

The remote peer is either running an old strongswan version which deletes redundant Child SAs by default, prior the charon.delete_rekeyed switch was introduced. Or charon.delete_rekeyed=yes was set on purposes.

This option is disabled by default, deleting the rekeyed CHILD_SAs was something that was added later (however, very old versions, i.e. < 5.0.0, might have deleted rekeyed SAs, no idea).

5.0.0 ... 5.5.3 holds the code which deletes redundant IKEv1 CHILD_SAs, unconditional.
5.6.0 introduced the charon.delete_rekeyed option, which made that deletion conditional. Disabled by default.

There could be still other IKEv1 implementation doing a similar thing, so IMHO it would be good if latest/future strongswan version could deal with such implementations.

... or in following situation:

  1. only one peer has charon.delete_rekeyed = yes set (or an older strongswan version, or different implementation with similar behavior)
  2. QM rekey results in redundant Child SAs
  3. rekey_time (incl. jitter) expires for Child SA {2} on peer with charon.delete = no, marked as CHILD_REKEYED, no QM deletion send
  4. rekey_time (incl. jitter) expires for Child SA {X3} on peer with charon.delete = yes, marked as CHILD_REKEYED and sends QM deletion request. Remote-peer triggers down event. (That's the peer decides the "other" redundant Child SA)

What are {2} and {X3}? How are they related?

The idea was that {2}/{X2} and {3}/{X3} represent logical the same Child SAs on the remote and local peer. I prefixed the remote Child SA with a X to not indicate an assumption that the reqid or unique id are in sync between the local and remote peer.

Maybe change the rekey settings so it's always the same host that initiates the rekeyings.

Yes, that is usually the workaround we ask or try to apply in such cases.

And why not in this case? And why do you use jitter if it causes problems?

Those are just observations which triggered the bug in the first place. There is no particular reason to have such configuration (both initiator, jitter applied).

But you say there is a rekey collision, but why are there only two CHILD_SAs? And both installed? That seems strange.

I thought that two CHILD_SAs and both installed is the undesired result of the absence of QM rekey collision in IKEv1. And that was the recent to introduce the is_redundant() check in task_manager_v1.c to sort avoid unnecessary updown triggers if a redundant CHILD_SA gets deleted.

The is_redundant() check is to avoid rekeying redundant SAs (it's called in queue_child_rekey() after all), that has nothing to do with the updown events. The state change to CHILD_REKEYED afterwards, however, is to avoid triggering the updown event (#1421).

I see, thanks for clarification.

4. remote peer sends DELETE for QM {3}, because the peer decided to delete the "other" redundant Child SA (e.g. bad luck with the jitter)

On what is that decision based?

My guess is that this is based on the Child SA rekey_time (incl. jitter) and the is_redundant() function which marked the Child SA as REKEYED, and if configured (charon.delete_rekeyed) deletes it right away, if there is a fresher instance.

charon.delete_rekeyed causes rekeyed SAs to get deleted, not redundant ones (except when the SA is rekeyed next). So I don't think this is directly related. And according to the log it deleted the CHILD_SA that was created later (at least on this end). "fresher" is no absolute property of these SAs, so deleting redundant SAs proactively based on this, like the peer here might do is tricky (if both did this, they might end up without any SAs).

I agree.

IIUC in IKEv1 lifetime is negotiated

Charon ignores anything the peer sends, it only uses its locally configured lifetimes.

IMHO this can be fixed if the quick_delete.c implementation gets also prepared to deal with redundant Child SAs as the queue_child_rekey function is. And by not making assumption what is the fresher/older Child SA instance in a remote context. And prepare for the case that the peer might delete redundant Child SAs, even if the local configuration would not.

Yes, checking for redundant CHILD_SAs in the quick-delete task probably makes sense. But there are some issues with the proposed patch e.g. have_equal_ts doesn't really work as generic utility function because it only compares the first TS, which is generally fine for IKEv1, but not for IKEv2, even with delete_rekeyed disabled should the state be changed to REKEYED (otherwise, the updown event is triggered when the hard lifetime expires), the check in the quick delete task doesn't ignore the searched CHILD_SA (the other check did so implicitly by hard-comparing the rekey times), and not only should the updown event be disabled, so should the close action.

I pushed alternative patches to the 2902-ikev1-redundant-delete branch.

Thanks for pointing out those flaws.

I still have one concern with the proposed patches in 2902-ikev1-redundant-delete, that it is not handling following scenario:

1. QM rekey collisions happens, results in redundant CHILD SAs on both sides. Locally this results in {2} and {3}
2. queue_child_rekey() is called for {2}, turns out to be older, gets marked as rekeyed
3. peer send QM delete for {3}
4. quick-delete runs ikev1_child_sa_is_redundant() which return FALSE, since there is no other Child SA in CHILD_INSTALLED state available
5. close action and updown hook gets triggered, IMHO not desired

Shouldn't at least the Child SA {2} used until an actual rekey happens?

That was my intention to make updating the redundant Child SA to state CHILD_REKEYED, in the queue_child_rekey(), conditional to the charon.delete_rekeyed setting.

#6 Updated by Tobias Brunner 5 months ago

The remote peer is either running an old strongswan version which deletes redundant Child SAs by default, prior the charon.delete_rekeyed switch was introduced. Or charon.delete_rekeyed=yes was set on purposes.

This option is disabled by default, deleting the rekeyed CHILD_SAs was something that was added later (however, very old versions, i.e. < 5.0.0, might have deleted rekeyed SAs, no idea).

5.0.0 ... 5.5.3 holds the code which deletes redundant IKEv1 CHILD_SAs, unconditional.
5.6.0 introduced the charon.delete_rekeyed option, which made that deletion conditional. Disabled by default.

Nope, the option is mainly for rekeyed SAs and was added with 5.4.0 (see 2f3c08d268). The change that was added with 5.6.0 (083208e805) uses the same option for consistency with how actively rekeyed SAs are handled.

There could be still other IKEv1 implementation doing a similar thing, so IMHO it would be good if latest/future strongswan version could deal with such implementations.

Nobody should have to deal with IKEv1 anymore, really.

I still have one concern with the proposed patches in 2902-ikev1-redundant-delete, that it is not handling following scenario:

1. QM rekey collisions happens, results in redundant CHILD SAs on both sides. Locally this results in {2} and {3}
2. queue_child_rekey() is called for {2}, turns out to be older, gets marked as rekeyed

The code doesn't check when the SA was installed (i.e. whether it's "older"), but when the SA is scheduled to be rekeyed (to see which one expires earlier, as the jobs that trigger calls to queue_child_rekey() might not be in order). But if {3} won't be rekeyed in a while {2} will be marked as rekeyed and either left alone until it expires, or deleted, depending on delete_rekeyed.

3. peer send QM delete for {3}
4. quick-delete runs ikev1_child_sa_is_redundant() which return FALSE, since there is no other Child SA in CHILD_INSTALLED state available
5. close action and updown hook gets triggered, IMHO not desired

IMHO that's correct, because if this happens you end up without a CHILD_SA. While {2} might still be there, it will be deleted once it expires and not get recreated.

Shouldn't at least the Child SA {2} used until an actual rekey happens?

The call to queue_child_rekey() was the rekeying. There will not be another call for {2}. And it was deemed redundant because it looked like the rekeying will later be triggered by {3}. But if the other peer then goes ahead and deletes {3}, that's that. As I said, proactively deleting SAs might not be a good idea.

That was my intention to make updating the redundant Child SA to state CHILD_REKEYED, in the queue_child_rekey(), conditional to the charon.delete_rekeyed setting.

You don't really gain anything. The SA is just in a different state. As I said above, there won't be a rekeying for it. Either way, it will just be deleted once it expires.

#7 Updated by Daniel Gollub 5 months ago

Tobias Brunner wrote:

The remote peer is either running an old strongswan version which deletes redundant Child SAs by default, prior the charon.delete_rekeyed switch was introduced. Or charon.delete_rekeyed=yes was set on purposes.

This option is disabled by default, deleting the rekeyed CHILD_SAs was something that was added later (however, very old versions, i.e. < 5.0.0, might have deleted rekeyed SAs, no idea).

5.0.0 ... 5.5.3 holds the code which deletes redundant IKEv1 CHILD_SAs, unconditional.
5.6.0 introduced the charon.delete_rekeyed option, which made that deletion conditional. Disabled by default.

Nope, the option is mainly for rekeyed SAs and was added with 5.4.0 (see 2f3c08d268). The change that was added with 5.6.0 (083208e805) uses the same option for consistency with how actively rekeyed SAs are handled.

I was thinking about this particular commit:e24fa461 when I was referring 5.0.0 as the first release doing the deletion of redundant IKEv1 Child SAs.

There could be still other IKEv1 implementation doing a similar thing, so IMHO it would be good if latest/future strongswan version could deal with such implementations.

Nobody should have to deal with IKEv1 anymore, really.

Personally, I totally agree. I'm still surprised why so many other services provide IKEv1-only interfaces. If that wouldn't be the case, I guess, we wouldn't have to deal with any IKEv1 madness like this anymore.

I still have one concern with the proposed patches in 2902-ikev1-redundant-delete, that it is not handling following scenario:

1. QM rekey collisions happens, results in redundant CHILD SAs on both sides. Locally this results in {2} and {3}
2. queue_child_rekey() is called for {2}, turns out to be older, gets marked as rekeyed

The code doesn't check when the SA was installed (i.e. whether it's "older"), but when the SA is scheduled to be rekeyed (to see which one expires earlier, as the jobs that trigger calls to queue_child_rekey() might not be in order). But if {3} won't be rekeyed in a while {2} will be marked as rekeyed and either left alone until it expires, or deleted, depending on delete_rekeyed.

Understood.

3. peer send QM delete for {3}
4. quick-delete runs ikev1_child_sa_is_redundant() which return FALSE, since there is no other Child SA in CHILD_INSTALLED state available
5. close action and updown hook gets triggered, IMHO not desired

IMHO that's correct, because if this happens you end up without a CHILD_SA. While {2} might still be there, it will be deleted once it expires and not get recreated.

Yep, that makes sense. Thanks for clarifying.

Shouldn't at least the Child SA {2} used until an actual rekey happens?

The call to queue_child_rekey() was the rekeying. There will not be another call for {2}. And it was deemed redundant because it looked like the rekeying will later be triggered by {3}. But if the other peer then goes ahead and deletes {3}, that's that. As I said, proactively deleting SAs might not be a good idea.

Ok, perfect. I focus then in future on remote peers doing that problematic behavior instead and get those change their behavior. I agree now, there is nothing more we can be done from the strongswan/charon side to improve the IKEv1 behavior in that case.

That was my intention to make updating the redundant Child SA to state CHILD_REKEYED, in the queue_child_rekey(), conditional to the charon.delete_rekeyed setting.

You don't really gain anything. The SA is just in a different state. As I said above, there won't be a rekeying for it. Either way, it will just be deleted once it expires.

Understood.

Gave your patches a quick, they seem to do the job and prevent the the updown hook calls.

Thanks a lot!

#8 Updated by Daniel Gollub 5 months ago

Logs with patch applied:

Feb 01 15:12:56 dut charon[5410]: 16[NET] <6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (87 bytes)
Feb 01 15:12:56 dut charon[5410]: 16[ENC] <6> parsed ID_PROT request 0 [ SA ]
Feb 01 15:12:56 dut charon[5410]: 16[IKE] <6> 192.168.122.4 is initiating a Main Mode IKE_SA
Feb 01 15:12:56 dut charon[5410]: 16[ENC] <6> generating ID_PROT response 0 [ SA V V ]
Feb 01 15:12:56 dut charon[5410]: 16[NET] <6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (120 bytes)
Feb 01 15:12:56 dut charon[5410]: 08[NET] <6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (236 bytes)
Feb 01 15:12:56 dut charon[5410]: 08[ENC] <6> parsed ID_PROT request 0 [ KE No ]
Feb 01 15:12:56 dut charon[5410]: 08[ENC] <6> generating ID_PROT response 0 [ KE No ]
Feb 01 15:12:56 dut charon[5410]: 08[NET] <6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (260 bytes)
Feb 01 15:12:56 dut charon[5410]: 13[NET] <6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (76 bytes)
Feb 01 15:12:56 dut charon[5410]: 13[ENC] <6> parsed ID_PROT request 0 [ ID HASH ]
Feb 01 15:12:56 dut charon[5410]: 13[CFG] <6> looking for pre-shared key peer configs matching 192.168.122.153...192.168.122.4[192.168.122.4]
Feb 01 15:12:56 dut charon[5410]: 13[CFG] <6> selected peer config "peer-192.168.122.4" 
Feb 01 15:12:56 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> IKE_SA peer-192.168.122.4[6] established between 192.168.122.153[192.168.122.153]...192.168.122.4[192.168.122.4]
Feb 01 15:12:56 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> scheduling reauthentication in 28784s
Feb 01 15:12:56 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> maximum IKE_SA lifetime 28798s
Feb 01 15:12:56 dut charon[5410]: 13[ENC] <peer-192.168.122.4|6> generating ID_PROT response 0 [ ID HASH ]
Feb 01 15:12:56 dut charon[5410]: 13[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (76 bytes)
Feb 01 15:12:56 dut charon[5410]: 13[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (364 bytes)
Feb 01 15:12:56 dut charon[5410]: 13[ENC] <peer-192.168.122.4|6> parsed QUICK_MODE request 112272743 [ HASH SA No KE ID ID ]
Feb 01 15:12:56 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> received 3600s lifetime, configured 30s
Feb 01 15:12:56 dut charon[5410]: 13[ENC] <peer-192.168.122.4|6> generating QUICK_MODE response 112272743 [ HASH SA No KE ID ID ]
Feb 01 15:12:56 dut charon[5410]: 13[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes) 
Feb 01 15:12:56 dut charon[5410]: 12[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (60 bytes)
Feb 01 15:12:56 dut charon[5410]: 12[ENC] <peer-192.168.122.4|6> parsed QUICK_MODE request 112272743 [ HASH ]
Feb 01 15:12:56 dut charon[5410]: 12[IKE] <peer-192.168.122.4|6> CHILD_SA peer-192.168.122.4{5} established with SPIs c4e5a51a_i c4ebdb78_o and TS 0.0.0.0/0 === 0.0.0.0/0
Feb 01 15:12:56 dut charon[5410]: 16[KNL] interface vti0 activated

Feb 01 15:13:00 dut charon[5410]: 10[KNL] creating rekey job for CHILD_SA ESP/0xc4ebdb78/192.168.122.4
Feb 01 15:13:00 dut charon[5410]: 10[ENC] <peer-192.168.122.4|6> generating QUICK_MODE request 3902078014 [ HASH SA No KE ID ID ]
Feb 01 15:13:00 dut charon[5410]: 10[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes) 

Feb 01 15:13:00 dut charon[5410]: 16[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (364 bytes)
Feb 01 15:13:00 dut charon[5410]: 16[ENC] <peer-192.168.122.4|6> parsed QUICK_MODE request 2197714524 [ HASH SA No KE ID ID ]
Feb 01 15:13:00 dut charon[5410]: 16[IKE] <peer-192.168.122.4|6> received 3600s lifetime, configured 30s
Feb 01 15:13:00 dut charon[5410]: 16[IKE] <peer-192.168.122.4|6> detected rekeying of CHILD_SA peer-192.168.122.4{5}
Feb 01 15:13:00 dut charon[5410]: 16[ENC] <peer-192.168.122.4|6> generating QUICK_MODE response 2197714524 [ HASH SA No KE ID ID ]
Feb 01 15:13:00 dut charon[5410]: 16[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes)

Feb 01 15:13:00 dut charon[5410]: 13[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (364 bytes)
Feb 01 15:13:00 dut charon[5410]: 13[ENC] <peer-192.168.122.4|6> parsed QUICK_MODE response 3902078014 [ HASH SA No KE ID ID ]
Feb 01 15:13:00 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> received 3600s lifetime, configured 30s
Feb 01 15:13:00 dut charon[5410]: 13[IKE] <peer-192.168.122.4|6> CHILD_SA peer-192.168.122.4{6} established with SPIs c1c196b0_i dc2e4585_o and TS 0.0.0.0/0 === 0.0.0.0/0
Feb 01 15:13:00 dut charon[5410]: 13[ENC] <peer-192.168.122.4|6> generating QUICK_MODE request 3902078014 [ HASH ]
Feb 01 15:13:00 dut charon[5410]: 13[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (60 bytes)

Feb 01 15:13:00 dut charon[5410]: 10[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (60 bytes)
Feb 01 15:13:00 dut charon[5410]: 10[ENC] <peer-192.168.122.4|6> parsed QUICK_MODE request 2197714524 [ HASH ]
Feb 01 15:13:00 dut charon[5410]: 10[IKE] <peer-192.168.122.4|6> CHILD_SA peer-192.168.122.4{7} established with SPIs c49f5074_i 831546e4_o and TS 0.0.0.0/0 === 0.0.0.0/0

Feb 01 15:13:01 dut charon[5410]: 06[KNL] creating rekey job for CHILD_SA ESP/0xc4e5a51a/192.168.122.153

Feb 01 15:13:01 dut charon[5410]: 12[NET] <peer-192.168.122.4|6> received packet: from 192.168.122.4[500] to 192.168.122.153[500] (76 bytes)
Feb 01 15:13:01 dut charon[5410]: 12[ENC] <peer-192.168.122.4|6> parsed INFORMATIONAL_V1 request 445811721 [ HASH D ] 
Feb 01 15:13:01 dut charon[5410]: 12[IKE] <peer-192.168.122.4|6> received DELETE for ESP CHILD_SA with SPI 831546e4
Feb 01 15:13:01 dut charon[5410]: 12[IKE] <peer-192.168.122.4|6> detected redundant CHILD_SA peer-192.168.122.4{7}
Feb 01 15:13:01 dut charon[5410]: 12[IKE] <peer-192.168.122.4|6> closing CHILD_SA peer-192.168.122.4{7} with SPIs c49f5074_i (0 bytes) 831546e4_o (0 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0

Feb 01 15:13:03 dut charon[5410]: 15[KNL] creating rekey job for CHILD_SA ESP/0xc1c196b0/192.168.122.153
Feb 01 15:13:03 dut charon[5410]: 06[KNL] creating rekey job for CHILD_SA ESP/0xdc2e4585/192.168.122.4
Feb 01 15:13:03 dut charon[5410]: 15[ENC] <peer-192.168.122.4|6> generating QUICK_MODE request 3659108788 [ HASH SA No KE ID ID ] 
Feb 01 15:13:03 dut charon[5410]: 15[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes)
Feb 01 15:13:07 dut charon[5410]: 11[IKE] <peer-192.168.122.4|6> sending retransmit 1 of request message ID 3659108788, seq 3
Feb 01 15:13:07 dut charon[5410]: 11[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes)
Feb 01 15:13:14 dut charon[5410]: 08[IKE] <peer-192.168.122.4|6> sending retransmit 2 of request message ID 3659108788, seq 3
Feb 01 15:13:14 dut charon[5410]: 08[NET] <peer-192.168.122.4|6> sending packet: from 192.168.122.153[500] to 192.168.122.4[500] (380 bytes)
Feb 01 15:13:26 dut charon[5410]: 09[KNL] creating delete job for CHILD_SA ESP/0xc4e5a51a/192.168.122.153
Feb 01 15:13:26 dut charon[5410]: 16[KNL] creating delete job for CHILD_SA ESP/0xc4ebdb78/192.168.122.4

#9 Updated by Tobias Brunner 5 months ago

Thanks for testing. Will be included in the next release.

#10 Updated by Tobias Brunner 3 months ago

  • Status changed from Feedback to Closed
  • Assignee set to Tobias Brunner
  • Resolution set to Fixed

Also available in: Atom PDF