Project

General

Profile

Issue #3663

Multiple ways to end up with duplicate / redundant child SA entries

Added by Jim Pingle 9 months ago. Updated 9 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
configuration
Affected version:
5.9.0
Resolution:

Description

There are multiple ways to end up with duplicate / redundant child SA entries, but most if not all of them can be avoided with specific changes in settings.

I'm not sure if there is any possible solution in strongSwan code to address this, but it may be at least worth calling out more in the docs since it seems to be a common occurrence. The docs do mention the possibility of collisions but not the consequences of that happening. If there is a programmatic solution that would be even better, but I understand that may not be feasible.

The main problem I've seen is that if both peers attempt to rekey at the same time, they can end up with duplicate child SA entries. Over time the number accumulates as the situation happens again and again. Depending on circumstances they can reach into the hundreds or more. I'm not sure what the upper bound is (if there is one), since eventually it overruns the vici message size limit and you can't tell.

The easiest way to provoke this is to set the child life_time to the same value on both peers:

This example is from a routed (VTI on FreeBSD) IPsec connection using IKEv2 and set to rekey (reauth_time = 0).

        children {
            con4000 {
                close_action = start
                dpd_action = restart
                policies = no
                life_time = 3600
                start_action = start
                remote_ts = 0.0.0.0/0
                local_ts = 0.0.0.0/0
                reqid = 4000
                esp_proposals = aes128gcm128-modp2048,aes128-sha256-modp2048
            }
        }

I know that isn't an ideal configuration, but it's the easiest way to provoke the problem. The reason it's so easy is because rekey_time defaults to 1h so if you set life_time to 1h, rand_time ends up as 0 by default so both peers will always choose the exact same time to rekey (1h in this case). Of course you could also provoke it by manually setting life_time and rekey_time to the same (lower) value and/or explicitly setting rand_time to 0. -- I chose setting life_time = 3600 here as that is a commonly recommended IPsec child SA lifetime value and I could see users setting it this way out of habit, not knowing that using rekey_time is better.

Once life_time is reached both sides attempt to rekey simultaneously, ignoring what the peer is doing, and the result is two Child SA entries for the exact same set of traffic selectors, one of which is redundant. In some cases, the peers use one for each direction at first, too. These are never removed, and at the next rekey event, they ALL get rekeyed. In the worst case I've seen, each rekey event doubles the number of Child SA entries (1, 2, 4, 8, 16, 32, 64, 128...), and it's only using one (or two at most) of those. Logs are not very helpful as they only show each side creating the child SA without errors.

Fortunately there are ways to work around the problem, but there can be some (minor) drawbacks.

Workaround 1: The easiest thing to do is to set one peer to a different total child lifetime (beyond the range of possible times due to rand_time) since this way only the peer with the lower time will trigger the child rekey event. For example, local could have rekey_time = 3240 and remote could have rekey_time = 4860.

Workaround 2: Both can be set to the same child rekey_time with a sufficiently large rand_time but you still have a chance of duplicates if both peers randomly select the same rekey time. The larger the rand_time the less likely this is to occur, but with one chance per interval (e.g. once per hour) it can still happen over time.

Setting close_action = none and dpd_action = clear on one peer can help reduce the number of duplicates in other cases as well, along with setting start_action = none to ensure automatic initiation is done by one peer only to avoid duplicates at startup and after timeouts.

I know I focused on Child SA entries here but similar problems can happen with IKE SA entries as well when collisions happen. The easiest way to provoke that is to have both set to initiate (start_action = start) and start strongSwan at the same time on both.

Suggestion 1: Ideally I'd like to see a new configurable behavior where duplicate SA entries can be discarded/rejected in some way but I realize that can be tricky when this is a race condition. Even limiting duplicates to only 1-2 extras might be feasible for edge cases which may need the overlap, or a check for duplicates could reap the old/redundant ones at rekey time rather than continually rekeying redundant copies.

Suggestion 2: It may also be worth checking that the random value picked for rekey based on rand_time is sufficiently randomized. The less random that number, the more likely a collision is to occur.

Suggestion 3: Choosing a lower default rekey_time could also help, or at least base the default value off life_time when life_time is set and rekey_time is NOT set. For example, the current behavior is that if you set rand_time, then life_time is automatically calculated as rekey_time +10%. That can still happen, but if you set life_time without also setting rekey_time, the default value of rekey_time could be calculated as 90% of life_time). The end result is that the actual 1h default of rekey_time would only kick in if both rekey_time and life_time are unset.

If nothing else, perhaps this issue will end up high in Google search results and let people know how to work around it :-)

See also: Note 32 on https://redmine.pfsense.org/issues/10176#note-32

Thanks!

History

#1 Updated by Tobias Brunner 9 months ago

  • Category set to configuration
  • Status changed from New to Feedback

The main problem I've seen is that if both peers attempt to rekey at the same time, they can end up with duplicate child SA entries.

No, rekeying will not cause duplicates even if there are collisions (at least not with IKEv2).

I know that isn't an ideal configuration, but it's the easiest way to provoke the problem. The reason it's so easy is because rekey_time defaults to 1h so if you set life_time to 1h, rand_time ends up as 0 by default so both peers will always choose the exact same time to rekey (1h in this case).

No, what you actually end up with is an expiring CHILD_SA after one hour without any time to do any rekeying. That is, you disabled rekeying. The two peers will instead create new and completely independent replacement CHILD_SAs (i.e. you end up with two CHILD_SAs, for which the same will happen again and the number of SAs will double every hour).

Of course you could also provoke it by manually setting life_time and rekey_time to the same (lower) value and/or explicitly setting rand_time to 0.

Setting rand_time to 0 is not the same as configuring the same value for soft and hard lifetime. The former will cause a rekeying at the soft lifetime (only if it's lower than the hard lifetime), the latter disables rekeying.

I chose setting life_time = 3600 here as that is a commonly recommended IPsec child SA lifetime value and I could see users setting it this way out of habit, not knowing that using rekey_time is better.

Where did you see setting life_time to an hour recommended? And life_time is definitely not the main setting when using swanctl.conf (as both swanctl.conf and ExpiryRekey describe, the default values are derived from rekey_time, which is listed first and makes the more sense to configure rekeying).

Once life_time is reached both sides attempt to rekey simultaneously

Again, if that time is reached, there is no rekeying happening, but new replacement SAs are created from scratch (the kernel already removed the SAs at this point).

If you have any proof of duplicates getting created if there actually is a rekeying (collision) please post the logs.

I know I focused on Child SA entries here but similar problems can happen with IKE SA entries as well when collisions happen. The easiest way to provoke that is to have both set to initiate (start_action = start) and start strongSwan at the same time on both.

Yes, that could result in duplicates. But only one set, not more and more over time.

Suggestion 3: Choosing a lower default rekey_time could also help, or at least base the default value off life_time when life_time is set and rekey_time is NOT set.

As documented, the default value for rekey_time is 1 hour, i.e. it's always set and is the main setting to change the rekey time (if life_time and rand_time are not set they are derived from rekey_time, as documented).

#2 Updated by Jim Pingle 9 months ago

Tobias Brunner wrote:

The main problem I've seen is that if both peers attempt to rekey at the same time, they can end up with duplicate child SA entries.

No, rekeying will not cause duplicates even if there are collisions (at least not with IKEv2).
No, what you actually end up with is an expiring CHILD_SA after one hour without any time to do any rekeying. That is, you disabled rekeying. The two peers will instead create new and completely independent replacement CHILD_SAs (i.e. you end up with two CHILD_SAs, for which the same will happen again and the number of SAs will double every hour).

You are correct, poor choice of wording on my part, it is the creation collision/simultaneous creation of new entries which leads to the duplication.

I chose setting life_time = 3600 here as that is a commonly recommended IPsec child SA lifetime value and I could see users setting it this way out of habit, not knowing that using rekey_time is better.

Where did you see setting life_time to an hour recommended?

Not in the strongSwan docs specifically but for for IPsec in general from multiple vendors and third parties over the decades I've been using IPsec on various equipment. Just about everyone I've encountered had the Child SA/P2 life time at 3600 as the suggested value.

And life_time is definitely not the main setting when using swanctl.conf (as both swanctl.conf and ExpiryRekey describe, the default values are derived from rekey_time, which is listed first and makes the more sense to configure rekeying).

For swanctl, yes, but for those transitioning from ipsec.conf to swanctl format they had lifetime before, and the docs at https://wiki.strongswan.org/projects/strongswan/wiki/Fromipsecconf suggest migrating from ipsec.conf lifetime to swanctl life_time in the child:

lifetime=1h (default)     connections.<conn>.children.<child>.life_time=1h (default: 110% * rekey_time)
see ExpiryRekey for details

So those following that conversion doc, who only had lifetime in ipsec.conf and did not have margintime set before, may not notice that they now have a much different behavior. Yes, they could read the linked doc but if they are following their old config and didn't have margintime set before, they may skip over the section in the conversion doc for margintime and not know it's significant. Since there is no direct equivalent for margintime, it would have made more sense to default rekey_time lower in cases where only life_time is set to preserve similar behavior to the past.

Suggestion 3: Choosing a lower default rekey_time could also help, or at least base the default value off life_time when life_time is set and rekey_time is NOT set.

As documented, the default value for rekey_time is 1 hour, i.e. it's always set and is the main setting to change the rekey time (if life_time and rand_time are not set they are derived from rekey_time, as documented).

Right, and following the conversion doc someone could end up in this less-than-desirable condition because of those defaults, hence my suggestion to change the behavior and/or the docs.

In the case of the IKE SA, the combination of values made conversion more obvious. There is now no IKE SA "life_time" but the combination of rekey_time/reauth_time, over_time, and rand_time and their default behavior make more logical sense to me when reading the conversion, and the example for replacing IKE SA "ikelifetime" is more clear about how to preserve existing behavior since it is all laid out there. If the child SA lifetime section had followed the same pattern it would be more clear to users and readers of the docs. Probably too late to change the names and behavior of the config values to match between IKE SA and Child SA, but the docs could be adjusted. I can see the merits of both styles but would have preferred they be consistent (either both use life_time or both use over_time) to determine the hard upper limit.

At least the conversion documentation should be updated to suggest other replacement values in the places it's relevant such as the one mentioned above, and add more caution in other areas of the docs recommending against having identical values on both sides and so on. For example on ExpiryRekey it mentions collisions but not the consequences of collisions. By the number of old SA duplication issues on here and threads that pop up in search results, it's not uncommon for it to happen.

Would there not be any way to avoid the ongoing duplication as well? To determine that the SAs are identical, unnecessary, and not being used across multiple generations? I can see how that would be tricky, but allowing the collision to repeatedly double the number of open SAs over time seems like highly undesirable behavior, even if it is caused by a configuration mistake.

#3 Updated by Tobias Brunner 9 months ago

And life_time is definitely not the main setting when using swanctl.conf (as both swanctl.conf and ExpiryRekey describe, the default values are derived from rekey_time, which is listed first and makes the more sense to configure rekeying).

For swanctl, yes, but for those transitioning from ipsec.conf to swanctl format they had lifetime before, and the docs at https://wiki.strongswan.org/projects/strongswan/wiki/Fromipsecconf suggest migrating from ipsec.conf lifetime to swanctl life_time in the child:

That migration document probably was a bit too literal. I've tried to clarify that.

and add more caution in other areas of the docs recommending against having identical values on both sides and so on.

Using identical values on both sides shouldn't matter because rekey collisions are properly handled.

For example on ExpiryRekey it mentions collisions but not the consequences of collisions.

Because there are none.

By the number of old SA duplication issues on here and threads that pop up in search results, it's not uncommon for it to happen.

Again, the problem is not the collision (there isn't even any here), it's that these CHILD_SAs are completely independent and no peer has a reason to close one or the other.

Would there not be any way to avoid the ongoing duplication as well? To determine that the SAs are identical, unnecessary, and not being used across multiple generations? I can see how that would be tricky, but allowing the collision to repeatedly double the number of open SAs over time seems like highly undesirable behavior, even if it is caused by a configuration mistake.

Closing redundant SAs is tricky. Which peer should close which SA? During an actual rekey collision there is a standardized procedure (based on the nonces, only one SA remains afterwards and each peer knows whether to close the old or redundant SA). But outside of collision handling, different implementations might consider different SAs redundant. What might be avoidable is queuing/initiating an new SA that could result in a duplicate (which could avoid the exponential explosion). The problem is that due to traffic selector narrowing there might not actually result a duplicate (e.g. if the duplicates are created by kernel acquires and the traffic selectors derived from the matching traffic are sent to the server), and there are legitimate reasons for creating duplicate CHILD_SAs (different marks/interface IDs are generally used in such scenarios, though, which might allow to detect this situation). If you also consider duplicate IKE_SAs it could get even more complicated (there are legitimate use cases for duplicates here too e.g. fail-over/load-balancing). So for now it's best to avoid configs that could result in unwanted duplicates (e.g. trap policies without rekeying) or e.g. use a vici script where you can define your own policy in regards to what SAs to consider redundant/close (Noel Kuntze wrote something like that a while ago).

#4 Updated by Jim Pingle 9 months ago

Tobias Brunner wrote:

For swanctl, yes, but for those transitioning from ipsec.conf to swanctl format they had lifetime before, and the docs at https://wiki.strongswan.org/projects/strongswan/wiki/Fromipsecconf suggest migrating from ipsec.conf lifetime to swanctl life_time in the child:

That migration document probably was a bit too literal. I've tried to clarify that.

That's better, thanks!

For example on ExpiryRekey it mentions collisions but not the consequences of collisions.

Because there are none.

I think we're getting confused by terminology here. Rekey collisions, perhaps not, but Child SA creation "collisions" are most definitely happening, and they most definitely have consequences.

To put it another way: If both sides have the same child life_time and child SA rekey is disabled, they will almost certainly simultaneously create new child SA replacements after the old ones expire. A warning against doing that should be in the docs, at least. Perhaps something along these lines: "When rekey is disabled for children (rekey_time life_time and rand_time 0), peers should not use the same life_time value to avoid a potential race condition when creating new child SA entries. This race condition can lead to duplicate child SA entries which can build up over time."

Whether you call it a collision, race condition, overlap, simultaneous creation, or another name, the net result is the same.

By the number of old SA duplication issues on here and threads that pop up in search results, it's not uncommon for it to happen.

Again, the problem is not the collision (there isn't even any here), it's that these CHILD_SAs are completely independent and no peer has a reason to close one or the other.

Right but for 100% identical TS made within the same few seconds, you'd think rather than honoring the create it would see that it had just made (or accepted) an identical SA and ignore the duplicate somehow. But as you've said, it's not that easy. If it were later in its lifetime, such as close to a potential rekey or delete/recreate, I can see maybe wanting an overlapping SA in certain cases, but not so close to the initial creation.

Would there not be any way to avoid the ongoing duplication as well? To determine that the SAs are identical, unnecessary, and not being used across multiple generations? I can see how that would be tricky, but allowing the collision to repeatedly double the number of open SAs over time seems like highly undesirable behavior, even if it is caused by a configuration mistake.

Closing redundant SAs is tricky. Which peer should close which SA? During an actual rekey collision there is a standardized procedure (based on the nonces, only one SA remains afterwards and each peer knows whether to close the old or redundant SA). But outside of collision handling, different implementations might consider different SAs redundant.

I can see how that would be tough to figure out programmatically, even if we humans can look at them and see they are redundant.

What might be avoidable is queuing/initiating an new SA that could result in a duplicate (which could avoid the exponential explosion). The problem is that due to traffic selector narrowing there might not actually result a duplicate (e.g. if the duplicates are created by kernel acquires and the traffic selectors derived from the matching traffic are sent to the server), and there are legitimate reasons for creating duplicate CHILD_SAs (different marks/interface IDs are generally used in such scenarios, though, which might allow to detect this situation).

That's kind of what I was getting at above, we could define a certain window that within which a 100% TS match for a child SA could be ignored. For example if both sides try to initiate a child SA for the same TS pair (100% identical, not narrowed/etc) within 10 seconds it would be free to ignore the later one. Could be controlled via a config parameter in case someone wants to disable that behavior as well. Alternately, if a peer receives a child SA create message it could pause its own child SA creation queue and dequeue entries if a duplicate has already been negotiated within that window. If someone has a legitimate use for the duplication, they could disable that window value.

If you also consider duplicate IKE_SAs it could get even more complicated (there are legitimate use cases for duplicates here too e.g. fail-over/load-balancing).

Right, and since IKE SA entries don't have nearly the same problems duplicating over time there isn't much of a need for additional measures there.

So for now it's best to avoid configs that could result in unwanted duplicates (e.g. trap policies without rekeying) or e.g. use a vici script where you can define your own policy in regards to what SAs to consider redundant/close (Noel Kuntze wrote something like that a while ago).

Using a script to reap duplicates seems like a kludge but may work if someone absolutely must deal with the situation. Personally I'm fine dealing with it by fixing the configurations and the docs, but I'm still hopeful we can find a better solution for users in general since there may be cases that can't necessarily be handled that way.

#5 Updated by Tobias Brunner 9 months ago

To put it another way: If both sides have the same child life_time and child SA rekey is disabled, they will almost certainly simultaneously create new child SA replacements after the old ones expire.

I don't think disabling rekeying like that (with an actual hard limit) is common and it's certainly nothing we'd advise. There will be traffic interruptions and, without further firewalling, plaintext leaks, and since there is no de-duplication, there will definitely be duplicates if both peers are configured like that (the randomization doesn't matter), so it really doesn't make sense to set a hard limit without configuring a soft limit to rekey/replace the SAs properly.

Disabling rekeying in general (by setting rekey_time to 0, which automatically sets life_time to 0 and the SAs won't expire) is also not that common, at least not on both sides (there are situations, where disabling it on one end can help, because the other implementation doesn't properly respond to rekeyings). But it would not cause this problem anyway.

So other than this particular misconfiguration (which disables rekeying unintentionally) this is rather hypothetical.

A warning against doing that should be in the docs, at least. Perhaps something along these lines: "When rekey is disabled for children (rekey_time life_time and rand_time 0), peers should not use the same life_time value to avoid a potential race condition when creating new child SA entries. This race condition can lead to duplicate child SA entries which can build up over time."

Well, that particular config simply forces a new SA creation collision by setting a hard limit without rekeying and randomization so the creation of replacement SAs starts at basically the same time (as mentioned above, the randomization wouldn't actually make a difference because the peers would still both create their replacements independently, creating a replacement after a hard limit should really just be considered an act of last resort). So I'm not sure such a warning with a bad config example is such a good idea. Maybe adding a warning against setting a hard limit without configuring a (different enough) soft limit might be something to consider (although, if you follow the docs and primarily set the rekey* settings this won't be a problem).

For example if both sides try to initiate a child SA for the same TS pair (100% identical, not narrowed/etc) within 10 seconds it would be free to ignore the later one.

Packets don't have to arrive in the order in which they were sent, or at all (IKE is UDP-based). So one of the peers might not know yet that the other is creating an SA when it already receives a failure for the SA it tried to create. It might handle that as fatal error and close the IKE_SA before the other peer's request for the duplicate CHILD_SA arrives. That's what makes handling rekey collision not that trivial (but it is a standardized process that results in exactly one SA, no matter if both peers or only one peer notices the collision, or at which point - you can have a look at the exchange unit tests, source:src/libcharon/tests/suites, where different exchange collisions are tested).

Alternately, if a peer receives a child SA create message it could pause its own child SA creation queue and dequeue entries if a duplicate has already been negotiated within that window.

To some degree perhaps (i.e. for actually queued and not already active CHILD_SA creations). But if it already sent the request that won't work because IKE exchanges can't be aborted (i.e. retransmits are sent until the peer responds). And both peers would basically be in the same situation (sent a request, may have received another from the peer), so the issue is again which SA to abort/close (as mentioned before, this is handled properly during a rekey collision via nonces). In theory, a TEMPORARY_FAILURE could be returned, which should trigger a randomized retry, but that might again cause interoperability issues.

#6 Updated by Jim Pingle 9 months ago

Tobias Brunner wrote:

I don't think disabling rekeying like that (with an actual hard limit) is common and it's certainly nothing we'd advise. There will be traffic interruptions and, without further firewalling, plaintext leaks, and since there is no de-duplication, there will definitely be duplicates if both peers are configured like that (the randomization doesn't matter), so it really doesn't make sense to set a hard limit without configuring a soft limit to rekey/replace the SAs properly.

[...]

So other than this particular misconfiguration (which disables rekeying unintentionally) this is rather hypothetical.

Right, I don't think it's common either in practice. I did it this way at first by accident when converting from ipsec.conf to swanctl.conf and then deliberately to provoke the behavior when researching what was happening.

A warning against doing that should be in the docs, at least. Perhaps something along these lines: "When rekey is disabled for children (rekey_time life_time and rand_time 0), peers should not use the same life_time value to avoid a potential race condition when creating new child SA entries. This race condition can lead to duplicate child SA entries which can build up over time."

Well, that particular config simply forces a new SA creation collision by setting a hard limit without rekeying and randomization so the creation of replacement SAs starts at basically the same time (as mentioned above, the randomization wouldn't actually make a difference because the peers would still both create their replacements independently, creating a replacement after a hard limit should really just be considered an act of last resort). So I'm not sure such a warning with a bad config example is such a good idea. Maybe adding a warning against setting a hard limit without configuring a (different enough) soft limit might be something to consider (although, if you follow the docs and primarily set the rekey* settings this won't be a problem).

Sure, the text was just a suggestion as a "don't do that" warning showing what not to do as an example. Since the result is not immediately apparent but gets worse over time, a user may not realize what they have done is a misconfiguration right away, so it may be more helpful to at least describe what they shouldn't do. Someone might have followed the docs initially but then made a change for other reasons which didn't necessarily follow the docs linearly (vendor wanted to alter something, diagnosing some other issue, etc) so putting a warning of sorts near the relevant options would be better than trusting they followed things the right way.

Packets don't have to arrive in the order in which they were sent, or at all (IKE is UDP-based). So one of the peers might not know yet that the other is creating an SA when it already receives a failure for the SA it tried to create. It might handle that as fatal error and close the IKE_SA before the other peer's request for the duplicate CHILD_SA arrives. That's what makes handling rekey collision not that trivial (but it is a standardized process that results in exactly one SA, no matter if both peers or only one peer notices the collision, or at which point - you can have a look at the exchange unit tests, source:src/libcharon/tests/suites, where different exchange collisions are tested).

True, and though to me it looked linear in the logs (Peer A creates a new SA, then that finishes and Peer B creates its own copy) it could have been happening so quickly that it was actually already in process before the logs make it apparent.

To some degree perhaps (i.e. for actually queued and not already active CHILD_SA creations). But if it already sent the request that won't work because IKE exchanges can't be aborted (i.e. retransmits are sent until the peer responds). And both peers would basically be in the same situation (sent a request, may have received another from the peer), so the issue is again which SA to abort/close (as mentioned before, this is handled properly during a rekey collision via nonces). In theory, a TEMPORARY_FAILURE could be returned, which should trigger a randomized retry, but that might again cause interoperability issues.

The more I think about it, that may be too tricky as well since if both are initiating at the exact same time, both would try to issue a temporary failure to the other. That wouldn't be ideal if it's a hard timeout without rekey since even with the randomized retry connectivity would be lost until one side creates a new SA.

So perhaps there really isn't anything viable in code to address it at the time of initial negotiation, and warnings in the docs may have to be it. It still seems like there could be a way to identify the older unused and duplicate SA entries to stop them from running away like that, but that is also likely not worth the significant effort it would take to nail down.

Also available in: Atom PDF