Bug #2199: charon hangs when `parallel_route` is set to yes - strongSwan

Bug #2199

charon hangs when `parallel_route` is set to yes

Added by Alan Yang over 8 years ago. Updated over 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Tobias Brunner

Category:

kernel-interface

Target version:

5.5.3

Start date:

28.12.2016

Due date:

Estimated time:

Affected version:

5.5.1

Resolution:

Fixed

Description

With charon.plugins.kernel-netlink.parallel_route set to yes, charon hangs at startup..

I'm not sure if this bug is on the kernel side, and I don't know what is the point of performing "concurrent Netlink ROUTE queries on a single socket". But anyway, simply setting it to yes makes charon hang at startup, and that sounds like a bug to me.

# ipsec start --nofork --debug-all
Starting strongSwan 5.5.1 IPsec [starter]...
Loading config setup
[...]
found netkey IPsec stack
Attempting to start charon...
00[DMN] Starting IKE charon daemon (strongSwan 5.5.1, Linux 4.8.13-1-ARCH, x86_64)
00[KNL] sending XFRM_MSG_GETSPDINFO 201: => 20 bytes @ 0x7fff44dec940
00[KNL]    0: 14 00 00 00 25 00 01 00 C9 00 00 00 F7 0A 00 00  ....%...........
00[KNL]   16: 00 00 00 00                                      ....
00[KNL] received XFRM_MSG_NEWSPDINFO 201: => 76 bytes @ 0x170ea70
00[KNL]    0: 4C 00 00 00 24 00 00 00 C9 00 00 00 F7 0A 00 00  L...$...........
00[KNL]   16: 00 00 00 00 1C 00 01 00 00 00 00 00 00 00 00 00  ................
00[KNL]   32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00[KNL]   48: 0C 00 02 00 07 00 00 00 00 00 10 00 06 00 03 00  ................
00[KNL]   64: 20 20 00 00 06 00 04 00 80 80 00 00                ..........
00[KNL] sending XFRM_MSG_GETSPDINFO 202: => 20 bytes @ 0x7fff44dec940
00[KNL]    0: 14 00 00 00 25 00 01 00 CA 00 00 00 F7 0A 00 00  ....%...........
00[KNL]   16: 00 00 00 00                                      ....
00[KNL] received XFRM_MSG_NEWSPDINFO 202: => 76 bytes @ 0x170ea70
00[KNL]    0: 4C 00 00 00 24 00 00 00 CA 00 00 00 F7 0A 00 00  L...$...........
00[KNL]   16: 00 00 00 00 1C 00 01 00 00 00 00 00 00 00 00 00  ................
00[KNL]   32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00[KNL]   48: 0C 00 02 00 07 00 00 00 00 00 10 00 06 00 03 00  ................
00[KNL]   64: 20 20 00 00 06 00 04 00 80 80 00 00                ..........
00[KNL] known interfaces and IP addresses:
00[KNL] sending RTM_GETLINK 201: => 17 bytes @ 0x7fff44deca10
00[KNL]    0: 11 00 00 00 12 00 01 03 C9 00 00 00 F7 0A 00 00  ................
00[KNL]   16: 00                                               .
charon too long to start... - kill kill
child 2807 (charon) has been killed by sig 9

charon has died -- restart scheduled (5sec)

History

#1 Updated by Alan Yang over 8 years ago

Oh, I found this in the comment section of #1491:

No, these were added for a third-party implementation of the Netlink interface. Using them on vanilla Linux kernels does not improve performance (could even deteriorate it).

Why not document that along the configuration file? Plus a "breaks charon on mainline kernel" warning.

#2 Updated by Noel Kuntze over 8 years ago

Tracker changed from Issue to Bug
Start date set to 28.12.2016

Reproduced this with 5.5.2 on 4.9.20-lts.
It hangs right after "known interfaces and IP addresses".

#3 Updated by Tobias Brunner over 8 years ago

Reproduced this with 5.5.2 on 4.9.20-lts.
It hangs right after "known interfaces and IP addresses".

Well, it's not meant to work, so not sure what you expected :)

#4 Updated by Noel Kuntze over 8 years ago

Well, the man page for strongswan.conf says this:

On vanilla Linux, DUMP queries fail with EBUSY and must be retried, further decreasing performance.

I understood this as that the kernel-netlink plugin retries the query then, instead of waiting. To me it seems it doesn't do that.
I'd expect just performance to be degraded.

#5 Updated by Tobias Brunner over 8 years ago

Category set to kernel-interface
Status changed from New to Feedback
Target version set to 5.5.3

OK, I had a closer look at this and it is actually a bug that causes that lockup. I pushed a fix to the 2199-kernel-netlink-parallel branch.

#6 Updated by Tobias Brunner over 8 years ago

Status changed from Feedback to Closed
Assignee set to Tobias Brunner
Resolution set to Fixed

Project

General

Profile

strongSwan

Issues