"unable to install policy"
We have a "road warrior" VPN setup. The clients are typically Windows, but we do have Linux, Android, and iOS used a bit too. Twice now we have seen an issue where a client's connection fails. Both times, it was a Windows client, but that may be a coincidence since that is the common case.
This time, the log messages were like this (with IPV6_SUBNET being substituted here for the actual prefix):
ipsec: 06[CFG] unable to install policy IPV6_SUBNET::1f/128 === ::/0 in for reqid 1366, the same policy for reqid 1364 exists
ipsec: 06[CFG] unable to install policy IPV6_SUBNET::1f/128 === ::/0 fwd for reqid 1366, the same policy for reqid 1364 exists
ipsec: 06[CFG] unable to install policy ::/0 === IPV6_SUBNET::1f/128 out for reqid 1366, the same policy for reqid 1364 exists
From what I recall, last time it was IPv4 where the routes were stuck.
We were previously running on Ubuntu 18.04 with strongswan 5.6.2. After the first time, we upgraded to Ubuntu 20.04 with strongswan 5.8.2, but it has just recurred.
Last time, I tried removing the "ip xfrm rules". That didn't fix anything. I restarted the strongswan service and that fixed it. Likewise, restarting the strongswan-starter service fixed it this time too. So it seems that the problem is the in-memory state in strongswan (broadly defined, possibly literally the charon? daemon).
Is there anything we should look for now? More importantly, is there anything we should look for next time before restarting the service?
#2 Updated by Tobias Brunner 2 months ago
- Category changed from charon to kernel-interface
- Status changed from New to Feedback
If there really is an active CHILD_SA with a duplicate policy, the same reqid should get assigned. There might be a weird race condition (previous CHILD_SA gone but policies not yet fully uninstalled - although that should not actually happen as the reqid is released after removing the policies). You'd have to check the log before and around the time when this happens to see what's going on (preferably with log levels for chd and knl set to 2).
#5 Updated by Richard Laager 2 months ago
Correct, reconnecting the client is not sufficient. As you expected, the client is reassigned the same IP, so it just keeps hitting the same issue each time. On the most recent failure, the user tried connecting 4 times, got the same address 4 times, and failed 4 times.
The desired reqid increases each time and the reqid that exists stays the same.
In grepping the logs, looks like we other instances of this that I didn't hear about. It looks like this happened on the 29th (twice), 2nd, 5th, 6th, and 13th. Some were IPv4 and some were IPv6.