Project

General

Profile

Issue #2077

Grace period before reassigning offline IP lease

Added by Danny Kulchinsky about 4 years ago. Updated almost 3 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Affected version:
5.5.0
Resolution:

Description

I was wondering if there is a possibility to configure a grace period during which an IP lease that went offline from the internal managed pool will not be reassigned ?

History

#1 Updated by Danny Kulchinsky about 4 years ago

Danny Kulchinsky wrote:

I was wondering if there is a possibility to configure a grace period during which an IP lease that went offline from the internal managed pool will not be reassigned ?

Trying to answer my own question :) hope I'm right.

So I guess there's no such option out-of-the-box... it seems that current mechanism takes the most recently "offlined" address in case no "unused" IPs available. Would it be difficult to modify the mechanism to use the oldest "offlined" IP instead (i.e. FIFO style) ?

Our problem is when a user is assigned an IP that was recently used by another user, it is failing to register to the service (behind the IPsec) because of some cache invalidation issue there (we are checking if it can be resolved there, but it may be impossible/difficult/time consuming since it's proprietary solution)

I came across Issue #841, which seems to be requesting the exact same thing, but it's not very clear to me how exactly this was implemented, will appreciate more guidance.

#2 Updated by Danny Kulchinsky about 4 years ago

Sorry for nagging, but am really stuck with this issue.

Any help/guidance will be greatly appreciated.

#3 Updated by Tobias Brunner about 4 years ago

  • Status changed from New to Feedback

If you are referring to the in-memory pools implemented by the mem_pool_t class the default behavior is to reassign offline leases to the same client (based on its identity) but not to other clients unless the pool is full (i.e. there are no unassigned IPs, all are either online or offline). Once the pool is full a usable offline lease is searched by enumerating the cached identities (offline leases are stored by identity to reassign them) and returning the first offline lease. There is currently no global list of offline leases that would allow implementing a global LRU reassignment. And since a hash table is enumerated and the hashes are randomized the identities will be enumerated in pretty much random order.

A possible solution is to replace the identity-specific lists of offline leases with a global one (as discussed in #841). That would allow reassigning the least recently used offline lease. It would, however, prevent reassigning previous leases to the same clients (or make it expensive if there are lots of offline leases as the list would have to be enumerated until one previously assigned to the client is found).
A solution that would still allow this would be to store the offline leases in an additional global list and cross reference them. But this increases the memory requirement as three additional pointers would be needed per offline lease (two to implement the global doubly linked list and one for a back reference to the identity entry to properly clean up when reassigning from the global list).
Also requiring more memory (but only a time_t per lease) would be something like you proposed, i.e. store the offline leases as we currently do but also store the time of their release and don't reassign them for a while. If the pool is full we could search for the first offline lease that has now - timestamp >= grace period, all the while keeping track of the oldest lease for which that's not true and if none that's older than the grace period is found reassign that. I guess this comes closest to what you want.

#4 Updated by Danny Kulchinsky about 4 years ago

Tobias Brunner wrote:

Also requiring more memory (but only a time_t per lease) would be something like you proposed, i.e. store the offline leases as we currently do but also store the time of their release and don't reassign them for a while. If the pool is full we could search for the first offline lease that has now - timestamp >= grace period, all the while keeping track of the oldest lease for which that's not true and if none that's older than the grace period is found reassign that. I guess this comes closest to what you want.

Sounds like we have a winner :)

Any chance to see this implemented ?

#5 Updated by Heikki Hannikainen almost 3 years ago

Here's a pull request to implement an address reassignment grace period in mem_pool:

https://github.com/strongswan/strongswan/pull/82

It's configured in ipsec.conf with left/rightreassignafter=60 (value in seconds), defaults to 0/off. An address may be only reassigned to the same client identity during the given period. No swanctl configuration or manual page update yet, I'll submit those too if this is otherwise an acceptable approach. I'm not quite happy with the configuration parameter name, but couldn't come up with a better one yet.

#6 Updated by Tobias Brunner almost 3 years ago

Here's a pull request to implement an address reassignment grace period in mem_pool:

https://github.com/strongswan/strongswan/pull/82

Thanks for the patch. I had a quick look. Seems to be about what I proposed above. Except that the fallback is missing if no offline lease within the configured period is found. Also, please read Contributions.

I don't really see how the internal numeric offset helps end users in the log messages (if anything, printing the actual IP address would make more sense, but that's printed shortly afterwards anyway). And since we don't use log prefixes I wouldn't apply the other log change either.

It's configured in ipsec.conf with left/rightreassignafter=60 (value in seconds), defaults to 0/off.

We don't add new features to that legacy config backend anymore. So please make this (only) configurable via swanctl.conf/vici.

I'm not quite happy with the configuration parameter name, but couldn't come up with a better one yet.

For swanctl.conf pool sections reassign_after might not be that bad.

#7 Updated by Heikki Hannikainen almost 3 years ago

Tobias Brunner wrote:

https://github.com/strongswan/strongswan/pull/82

Thanks for the patch. I had a quick look. Seems to be about what I proposed above. Except that the fallback is missing if no offline lease within the configured period is found. Also, please read Contributions.

Thank you for the feedback!

The missing fallback is intentional. Our problem is that addresses are often getting reassigned within < 1 second, and Bad Things happen as a consequence (no Internet for the next client, for example). We don't want Bad Things to happen, so I implemented and configured the grace period. The fallback path is: user does not get an IP address because the pool is full with currently-assigned or grace-period leases, gets disconnected, automatically reconnects to another gateway given by the DNS load balancer.

If a "hard" grace period like this is not needed in some environment, maybe LRU assignment would be good to have too as the relaxed alternative.

I don't really see how the internal numeric offset helps end users in the log messages (if anything, printing the actual IP address would make more sense, but that's printed shortly afterwards anyway). And since we don't use log prefixes I wouldn't apply the other log change either.

It helped a lot to debug the issue, but I don't mind if it's not merged. I'd prefer printing the IP address too, and having some common word in there to grep, since the logs are awfully big with debug level logging enabled for CFG.

If it's printed shortly after, and there is no identifying information (client/session/thread identification on each log line) it's a bit hard to use, since with the multithreaded daemon there might be multiple clients and leases being released and assigned at around the same time.

It's configured in ipsec.conf with left/rightreassignafter=60 (value in seconds), defaults to 0/off.

We don't add new features to that legacy config backend anymore. So please make this (only) configurable via swanctl.conf/vici.

Oh, ok. This was news to me.

#8 Updated by Tobias Brunner almost 3 years ago

The missing fallback is intentional. Our problem is that addresses are often getting reassigned within < 1 second, and Bad Things happen as a consequence (no Internet for the next client, for example). We don't want Bad Things to happen, so I implemented and configured the grace period.

Ah, I see. So I guess you set that period to a relatively low value (e.g. just a few seconds). I somehow imagined a different use case where you'd configure longer periods in order to avoid that users constantly get different virtual IPs assigned (in which case the pool might be unusable for a while without the fallback). Probably would be a good idea to mention this in the documentation of the new option.

If a "hard" grace period like this is not needed in some environment, maybe LRU assignment would be good to have too as the relaxed alternative.

As I said previously, LRU assignment (while still maintaining the identity link) imposes a significant memory overhead. But I guess it is an option if we removed the offline array on entry_t and just maintained a global array of offline leases (with or without grace period). This could even be made configurable (see the 2077-mem-pool-lru branch for an incomplete prototype implementation).

If it's printed shortly after, and there is no identifying information (client/session/thread identification on each log line) it's a bit hard to use, since with the multithreaded daemon there might be multiple clients and leases being released and assigned at around the same time.

The thread ID is always logged and if you enable the ike_name option for your logger you have the connection name and unique identifier as prefix too.

Also available in: Atom PDF