Bug #2199
charon hangs when `parallel_route` is set to yes
Description
With charon.plugins.kernel-netlink.parallel_route set to yes, charon hangs at startup..
I'm not sure if this bug is on the kernel side, and I don't know what is the point of performing "concurrent Netlink ROUTE queries on a single socket". But anyway, simply setting it to yes makes charon hang at startup, and that sounds like a bug to me.
# ipsec start --nofork --debug-all Starting strongSwan 5.5.1 IPsec [starter]... Loading config setup [...] found netkey IPsec stack Attempting to start charon... 00[DMN] Starting IKE charon daemon (strongSwan 5.5.1, Linux 4.8.13-1-ARCH, x86_64) 00[KNL] sending XFRM_MSG_GETSPDINFO 201: => 20 bytes @ 0x7fff44dec940 00[KNL] 0: 14 00 00 00 25 00 01 00 C9 00 00 00 F7 0A 00 00 ....%........... 00[KNL] 16: 00 00 00 00 .... 00[KNL] received XFRM_MSG_NEWSPDINFO 201: => 76 bytes @ 0x170ea70 00[KNL] 0: 4C 00 00 00 24 00 00 00 C9 00 00 00 F7 0A 00 00 L...$........... 00[KNL] 16: 00 00 00 00 1C 00 01 00 00 00 00 00 00 00 00 00 ................ 00[KNL] 32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00[KNL] 48: 0C 00 02 00 07 00 00 00 00 00 10 00 06 00 03 00 ................ 00[KNL] 64: 20 20 00 00 06 00 04 00 80 80 00 00 .......... 00[KNL] sending XFRM_MSG_GETSPDINFO 202: => 20 bytes @ 0x7fff44dec940 00[KNL] 0: 14 00 00 00 25 00 01 00 CA 00 00 00 F7 0A 00 00 ....%........... 00[KNL] 16: 00 00 00 00 .... 00[KNL] received XFRM_MSG_NEWSPDINFO 202: => 76 bytes @ 0x170ea70 00[KNL] 0: 4C 00 00 00 24 00 00 00 CA 00 00 00 F7 0A 00 00 L...$........... 00[KNL] 16: 00 00 00 00 1C 00 01 00 00 00 00 00 00 00 00 00 ................ 00[KNL] 32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00[KNL] 48: 0C 00 02 00 07 00 00 00 00 00 10 00 06 00 03 00 ................ 00[KNL] 64: 20 20 00 00 06 00 04 00 80 80 00 00 .......... 00[KNL] known interfaces and IP addresses: 00[KNL] sending RTM_GETLINK 201: => 17 bytes @ 0x7fff44deca10 00[KNL] 0: 11 00 00 00 12 00 01 03 C9 00 00 00 F7 0A 00 00 ................ 00[KNL] 16: 00 . charon too long to start... - kill kill child 2807 (charon) has been killed by sig 9 charon has died -- restart scheduled (5sec)
History
#1 Updated by Alan Yang over 8 years ago
Oh, I found this in the comment section of #1491:
No, these were added for a third-party implementation of the Netlink interface. Using them on vanilla Linux kernels does not improve performance (could even deteriorate it).
Why not document that along the configuration file? Plus a "breaks charon on mainline kernel" warning.
#2 Updated by Noel Kuntze over 8 years ago
- Tracker changed from Issue to Bug
- Start date set to 28.12.2016
Reproduced this with 5.5.2 on 4.9.20-lts.
It hangs right after "known interfaces and IP addresses".
#3 Updated by Tobias Brunner over 8 years ago
Reproduced this with 5.5.2 on 4.9.20-lts.
It hangs right after "known interfaces and IP addresses".
Well, it's not meant to work, so not sure what you expected :)
#4 Updated by Noel Kuntze over 8 years ago
Well, the man page for strongswan.conf
says this:
On vanilla Linux, DUMP queries fail with EBUSY and must be retried, further decreasing performance.
I understood this as that the kernel-netlink plugin retries the query then, instead of waiting. To me it seems it doesn't do that.
I'd expect just performance to be degraded.
#5 Updated by Tobias Brunner over 8 years ago
- Category set to kernel-interface
- Status changed from New to Feedback
- Target version set to 5.5.3
OK, I had a closer look at this and it is actually a bug that causes that lockup. I pushed a fix to the 2199-kernel-netlink-parallel branch.
#6 Updated by Tobias Brunner over 8 years ago
- Status changed from Feedback to Closed
- Assignee set to Tobias Brunner
- Resolution set to Fixed